MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

There are really two stories packaged in the most recent MLPerf Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently king of accelerated computing) wins again, sweeping all nine “events” (workflows) as it were. Its lead remains formidable. Story number two, perhaps more important, is MLPerf itself. It has matured through years to become a far more useful broad tool to evaluate competing ML-centric systems and operating environments.

Bottom line, as a handicapping tool for GPUs and their like, MLPerf is perhaps less than first intended because there are few contenders, but quite useful as a tool for prospective buyers to evaluate systems and for developers to assess rival systems. MLPerf has increasingly becoming a standard item on their checklists. (We’ll skip the v. slight confusion from mixing MLPerf’s parent entity’s name, MLCommons, with MLPerf, the name of the portfolio of benchmark suites).

The latest MLPerf Training exercise adds two new benchmarks — LoRA fine-tuning of LLama 2 70B and GNN (graph neural network) — and power metrics (optional) were also added to training. There were more than 205 performance results from 17 submitting organizations: ASUSTeK, Dell, Fujitsu, Giga Computing, Google, HPE, Intel (Habana Labs), Juniper Networks, Lenovo, NVIDIA, NVIDIA + CoreWeave, Oracle, Quanta Cloud Technology, Red Hat + Supermicro, Supermicro, Sustainable Metal Cloud (SMC), and tiny corp.

While Nvidia GPUs again dominated, Intel’s Habana Gaudi2 accelerator, Google TPU v-5P, and AMD’s Radeon RX 7900 XTX GPU (first-time participant) all had strong showings.

As shown above, MLCommons has steadily grown its portfolio of benchmarks. Training, introduced in 2018, was the first and is generally regarded as the most computationally intense. David Kanter emphasized the improving performance of training submission, crediting the MLPerf exercises with helping drive gains.

“[MLPerf], by bringing the whole community together, focuses us on what’s important. And this is a slide that shows the benefits. [It] is an illustration of Moore’s law — that is the yellow line at the bottom. On the what x axis is time, and on the y axis is relative performance. This is normalized performance for the best results on each benchmark in MLPerf, and how it improves over time,” said Kanter at a media/analyst pre-briefing.

“What you can see (slide below) is that in many cases we’re delivering five or 10x better performance than Moore’s law. That means that not only are we taking advantage of better silicon, but we’re getting better architectures, better algorithms, better scaling, all of these things come together to give us dramatically better performance over time. If you look back in the rear view mirror, since we started, it’s about 50x better, slightly more, but even if you look at relative to the last cycle, some of our benchmarks got nearly 2x better performance, in particular Stable Diffusion. So that’s pretty impressive in six months.”

Adding graph neural network and fine tuning to the training exercise were natural steps. Ritika Borkar and Hiwot Kassa, MLPerf Training working group co-chairs, walked through the new workflows.

Borkar said, “There’s a large class of data in the world which can be represented in the form of graphs — a collection of nodes and edges connecting the different nodes. For example, social networks, molecules, database, and log pages, and GNN, or graph neural networks, is the class of networks that are used to encapsulate information from such graph structured data, and as a result, you see GNNs in a wide range of commercial applications, such as recommender systems or ads fraud detection or drug discovery or doing graph answering or knowledge graphs.

“An example noted here is Alibaba Taobao recommendation system, which is based on a GNN network; it uses user behavior graph to which is of the magnitude of billions of vertices and edges. So as you can imagine, when you have to when a system has to work with graphs of this large magnitude, there are interesting performance characteristics that that are demanded from the system. And on that spirit, we wanted to include this kind of challenge in the developer benchmark suit.”

With regard to fine tuning LLMs, Kassa said, “We can divide the state of training LLM, at a high level, into two states. One is free training, where LLMs are trained on large unlabeled data for general purpose language understanding. This can take days to months to train and is computationally intensive. [The] GPT-3 benchmark and MLPerf training shows this phase, so we have that part already covered. The next phase is fine tuning, [which] is where we’re taking pre trained model and we’re training it with specific task, specific labeled data sets to enhance its accuracy on specific tasks, like, text summarization on specific topics. In fine tuning, we’re using less compute or memory resource [and] we have less cost for training. It’s becoming widely accessible and used in large range of AI users, and adding it to MLPerf is important and timely right now.

“When selecting fine tuning techniques we considered a number of techniques, and we selected parameter efficient fine tuning, which is a fine tuning technique that trains or tunes only a subset of the overall model parameters (PEFT); this significantly reduced training time and computational efficiency compared to the traditional fine tuning techniques that tunes all parameters. And from the PEFT method, we selected LoRA (low-rank adaption) that enables training of dense layers through rank decomposition matrix while maintaining the pre-trained weights frozen. This technique significantly reduced hardware requirement, memory usage, storage while still being performant compared to a fully fine tuned models,” she said.

Inclusion of the power metric in training was also new and now widely used yet, but given current concerns around energy use by AI technology and datacenters generally, its importance seems likely to grow.

As always digging out meaningful results from MLPerf training a painstaking effort in that system configurations vary widely and as does performance across the different workflows. Actually, developing an easier way to accomplish this might be a useful addition to MLPerf presentation arsenal, though perhaps unlikely as MLCommons is unlikely to want to spotlight better performers and antagonize lesser performers. Still, it’s perhaps a worthwhile goal. Here’s a link to the Training 4.0 results.

Per usual practice, MLCommons invites participants to submit brief statements intended to spotlight features that improve performance on the MLPerf benchmarks. So do this well while others are no more than marketing info. Those statements are appended to this article and worth scanning.

Nvidia was again the dominant winner in terms of accelerator performance. This is an old refrain. Intel (Habana, Gaudi2), AMD, and Google all had entries. Intel’s Gaudi3 is expected to available in the fall and the company said it plans to enter it in the fall MLPerf Inference benchmark.

Here are brief excerpts from three submitted statements:

Intel — “Training and fine-tuning results show competitive Intel Gaudi accelerator performance at both ends of the training and fine-tuning spectrum. The v4.0 benchmark features time-to-train (TTT) of a representative 1% slice of the GPT-3 model, a valuable measurement for assessing training performance on a very large 175B parameter model. Intel submitted results for GPT-3 training for time-to-train on 1024 Intel Gaudi accelerators, the largest cluster result to be submitted by Intel to date, with TTT of 66.9 minutes, demonstrating strong Gaudi 2 scaling performance on ultra-large LLMs.

“The benchmark also features a new training measurement: fine-tuning a Llama 2 model with 70B parameters. Fine-tuning LLMs is a common task for many customers and AI practitioners, making it a highlight relevant benchmark for everyday applications. Intel’s submission achieved a time-to-train of 78.1 minutes using 8 Intel Gaudi 2 accelerators. The submission leverages Zero-3 from DeepSpeed for optimizing memory efficiency and scaling during large model training, as well as Flash-Attention-2 to accelerate attention mechanisms.”

Juniper Networks — “For MLPerf Training v4.0, Juniper submitted benchmarks for BERT, DLRM, and Llama2-70B with LoRA fine tuning on a Juniper AI Cluster consisting of Nvidia A100 and H100 GPUs using Juniper’s AI Optimized Ethernet fabric as the accelerator interconnect. For BERT, we optimized pre-training tasks using a Wikipedia dataset, evaluating performance with MLM accuracy. Our DLRM submission utilized the Criteo dataset and HugeCTR for efficient handling of sparse and dense features, with AUC as the evaluation metric, achieving exceptional performance. The Llama2-70B model was fine-tuned using LoRA techniques with DeepSpeed and Hugging Face Accelerate, optimizing gradient accumulation for balanced training speed and accuracy.

“Most submissions were made on a multi-node setup, with PyTorch, DeepSpeed, and HugeCTR optimizations. Crucially, we optimized inter-node communication with RoCE v2, ensuring low-latency, high-bandwidth data transfers, which are critical for efficient, distributed training workloads.”

Google — “Cloud TPU v5p exhibits near-linear scaling performance (approximately 99.9% scaling efficiency) on the GPT-3 175b model pre-training task, ranging from 512 to 6144 chips. Previously, in the MLPerf Training v3.1 submission for the same task, we demonstrated horizontal scaling capabilities of TPU v5e across 16 pods (4096 chips) connected over a data center network (across multiple ICI domains). In this submission, we are showcasing the scaling to 6144 TPU v5p chips podslice (within a single ICI domain). For a comparable compute scale (1536 TPU v5p chips versus 4096 TPU v5e chips), this submission also shows an approximate 31% improvement in efficiency (measured as model flops utilization).

“This submission also showcases Google/MaxText, Google Cloud’s reference implementation for large language models. The training was done using Int8 mixed precision, leveraging Accurate Quantized Training. Near-linear scaling efficiency demonstrated across 512, 1024, 1536, and 6144 TPU v5p slices is an outcome of optimizations from codesign across the hardware, runtime & compiler (XLA), and framework (JAX). We hope that this work will reinforce the message of efficiency which translates to performance per dollar for large-scale training workloads.”

Back to Nvidia, last but hardly least. David Salvator, director of accelerated computing products, led a separate briefing on Nvidia’s latest MLPerf showing and perhaps a little chest-thumping is justified.

“So there are nine workloads in MLPerf and we’ve set new records on five of those nine workloads, which you see sort of across the top row (slide below). There a couple of these are actually brand new workloads, the graph, neural network, as well as the LLM fine tuning workloads, are net new workloads to this version of MLPerf. But in addition, we are constantly optimizing and tuning our software, and we are actually publishing our containerized software to the community on about a monthly cadence,” said Salvator.

“In addition to the two new models where we set new records, we’ve even improved our performance on three of the existing models, which you see on kind of the right hand side. We also have standing records on the additional four workloads, so we basically hold records across all nine workloads of MLPerf training. And this is just an absolute tip of the hat and [to] our engineering teams to continually improve performance and get more performance from our existing architectures. These were all achieved on the hopper architecture right.”

“About a year ago, right, we did a submission at about 3500 GPUs. That was with the software we had at the time. This is sort of a historical compare. If you fast forward to today, on a most recent submission of 11,616 GPUs, which is the biggest at scale submission we’ve ever done. What you see is that we’ve just about tripled the results, plus a little. Here’s what’s interesting about that. If you actually do the math on 1116, divided by 3584 you’ll see that it’s about 3.2x so what that means is we are getting essentially linear scaling right now. A lot of times with workloads, as you go to much larger scales, if you can get 65%-to-70% scaling efficiency, you’re pretty happy if you get 80% scaling efficiency. What we’ve been able to do through a combination of more hardware but also a lot of software tuning to get linear scaling on this workload. It’s very rare for this to happen,” said Salvator.

Salvator has also posted a blog on the latest results. It’s best to dig into the specific results to ferret useful insight for your particular purposes.

Link to MLPerf Training 4.0 results, https://mlcommons.org/benchmarks/training/

MLPerf 4.0 Submitted Statement by Vendors
The submitting organizations provided the following descriptions as a supplement to help the public understand their MLPerf Training v4.0 submissions and results. The statements do not reflect the opinions or views of MLCommons.

Asus

ASUS, a global leader in high-performance computing solutions, proudly announces its collaboration with MLPerf, the industry-standard benchmark for machine learning performance, to demonstrate the exceptional capabilities of its ESC-N8A and ESC8000A-E12 servers in the MLPerf Training v4.0 benchmarks.

The collaboration highlights ASUS’s commitment to advancing AI and machine learning technologies. The ESC-N8A and ESC8000A-E12 servers, equipped with cutting-edge hardware, have showcased remarkable performance and efficiency in the rigorous MLPerf Training v4.0 evaluations.

This collaboration with MLPerf reinforces ASUS’s role as a pioneer in AI and machine learning innovation. By continually pushing the boundaries of what is possible, ASUS aims to empower researchers, data scientists, and enterprises with the tools they need to drive technological advancements and achieve breakthrough results.

Partnering with MLPerf allows us to validate our servers’ capabilities in the most demanding AI benchmarks. The outstanding results achieved by the ESC-N8A and ESC8000A-E12 servers in MLPerf Training v4.0 highlight our commitment to delivering high-performance, scalable, and efficient solutions for AI workloads

Dell Technologies

Dell Technologies continues to accelerate the AI revolution by creating the industry’s first AI Factory with NVIDIA. At the heart of this factory is the continued commitment to advancing AI workloads. MLPerf submissions serve as a testament to Dell’s commitment to helping customers make informed decisions. In the MLPerf v4.0 Training Benchmark submissions, Dell PowerEdge servers showed excellent performance.

Dell submitted two new models, including Llama 2 and Graph Neural Networks. The Dell PowerEdge XE9680 server with 8 NVIDIA H100 Tensor Core GPUs continued to deliver Dell’s best performance results.

The Dell PowerEdge XE8640 server with four NVIDIA H100 GPUs and its direct liquid-cooled (DLC) sibling, the Dell PowerEdge XE9640, also performed very well. The XE8640 and XE9640 servers are ideal for applications requiring a balanced ratio of fourth-generation Intel Xeon Scalable CPUs to SXM or OAM GPU cores. The PowerEdge XE9640 was purpose-built for high-efficiency DLC, reducing the four-GPU server profile to a dense 2RU form factor, yielding maximum GPU core density per rack.

The Dell PowerEdge R760xa server was also tested, using four L40S GPUs and ranking high in performance for training these models. The L40S GPUs are PCIe-based and power efficient. The R760xa is a mainstream 2RU server with optimized power and airflow for PCIe GPU density.

Generate higher quality, faster time-to-value predictions, and outputs while accelerating decision-making with powerful solutions from Dell Technologies. Come and take a test drive in one of our worldwide Customer Solution or collaborate with us using one of our innovation labs to tap into one of our Centers of Excellence.

Fujitsu

Fujitsu offers a fantastic blend of systems, solutions, and expertise to guarantee maximum productivity, efficiency, and flexibility delivering confidence and reliability. Since 2020, we have been actively participating in and submitting to inference and training rounds for both data center and edge divisions.

In this round, we submitted benchmark results with two systems. The first is PRIMERGY CDI, equipped with 16 L40S GPUs in external PCIe-BOXes, and the second is PRIMERGY GX2560M7, equipped with four H100 SXM GPUs inside the server. The PRIMERGY CDI can accommodate up to 20 GPUs in three external PCI-BOXes as a single node server and can share the resources among multiple nodes. Additionally, the system configuration can be adjusted according to the size of training and inference workloads. Measurement results are displayed in the figure below. In image-related benchmarks, PRIMERGY CDI dominated, while PRIMERGY GX2560K7 excelled in language-related benchmarks.

Our purpose is to make the world more sustainable by building trust in society through innovation. With a rich heritage of driving innovation and expertise, we are dedicated to contributing to the growth of society and our valued customers. Therefore, we will continue to meet the demands of our customers and strive to provide attractive server systems through the activities of MLCommons.

Giga Computing

The MLPerf Training benchmark submitter – Giga Computing – is a GIGABYTE subsidiary that made up GIGABYTE’s enterprise division that designs, manufactures, and sells GIGABYTE server products.

The GIGABYTE brand has been recognized as an industry leader in HPC & AI servers and has a wealth of experience in developing hardware for all data center needs, while working alongside technology partners: NVIDIA, AMD, Ampere Computing, Intel, and Qualcomm.

In 2020, GIGABYTE joined MLCommons and submitted its first system. And with this round of the latest benchmarks, MLPerf Training v4.0 (closed division), the submitted GIGABYTE G593 Series platform has shown its versatility in supporting both AMD EPYC and Intel Xeon processors. The performance is in the pudding, and these benchmarks (v3.1 and v4.0) exemplify the impressive performance that is possible in the G593 series. Additionally, greater compute density and rack density are also a part of the G593 design that has been thermally optimized in a 5U form factor.

  • ● GIGABYTE G593-SD1: dense accelerated computing in a 5U server o 2x Intel Xeon 8480+ CPUs
    o 8x NVIDIA SXM H100 GPUs
    o Optimized for baseboard GPUs
  • ● Benchmark frameworks: Mxnet, PyTorch, dgl, hugectr
    To learn more about our solutions, visit: https://www.gigabyte.com/Enterprise
    Giga Computing’s website is still being rolled out: https://www.gigacomputing.com/

Google Cloud

In the MLPerf Training version 4.0 training submission, we are pleased to present Google Cloud TPU v5p, our most scalable TPU in production.

Cloud TPU v5p exhibits near-linear scaling performance (approximately 99.9% scaling efficiency) on the GPT-3 175b model pre-training task, ranging from 512 to 6144 chips. Previously, in the MLPerf Training v3.1 submission for the same task, we demonstrated horizontal scaling capabilities of TPU v5e across 16 pods (4096 chips) connected over a data center network (across multiple ICI domains). In this submission, we are showcasing the scaling to 6144 TPU v5p chips podslice (within a single ICI domain). For a comparable compute scale (1536 TPU v5p chips versus 4096 TPU v5e chips), this submission also shows an approximate 31% improvement in efficiency (measured as model flops utilization).

This submission also showcases Google/MaxText, Google Cloud’s reference implementation for large language models. The training was done using Int8 mixed precision, leveraging Accurate Quantized Training. Near-linear scaling efficiency demonstrated across 512, 1024, 1536, and 6144 TPU v5p slices is an outcome of optimizations from codesign across the hardware, runtime & compiler (XLA), and framework (JAX). We hope that this work will reinforce the message of efficiency which translates to performance per dollar for large-scale training workloads.

Hewlett Packard Enterprise

Hewlett Packard Enterprise (HPE) demonstrated strong inference performance in MLPerf Inference v4.0 along with strong AI model training and fine-tuning performance in MLPerf Training v4.0. Configurations this round featured an HPE Cray XD670 server with 8x NVIDIA H100 SXM 80GB Tensor Core GPUs and HPE ClusterStor parallel storage system as backend storage. HPE Cray systems combined with HPE ClusterStor are the perfect choice to power data-intensive workloads like AI model training and fine-tuning.

HPE’s results this round included single- and double-node configurations for on-premise deployments. HPE participated across three categories of AI model training: large language model (LLM) fine-tuning, natural language processing (NLP) training, and computer vision training. In all submitted categories and AI models, HPE Cray XD670 with NVIDIA H100 GPUs achieved the company’s fastest time-to-train performance to date for MLPerf on single- and double-node configurations. HPE also demonstrated exceptional performance compared to previous training submissions, which used NVIDIA A100 Tensor Core GPUs.

Based on our benchmark results, organizations can be confident in achieving strong performance when they deploy HPE Cray XD670 to power AI training and tuning workloads.

Intel (Habana Labs)

Intel is pleased to participate in the MLCommons latest benchmark, Training v4.0, submitting time-to-train results for GPT-3 training and Llama-70B fine-tuning with its Intel Gaudi 2 AI accelerators.

Training and fine-tuning results show competitive Intel Gaudi accelerator performance at both ends of the training and fine-tuning spectrum. The v4.0 benchmark features time-to-train (TTT) of a representative 1% slice of the GPT-3 model, a valuable measurement for assessing training performance on a very large 175B parameter model. Intel submitted results for GPT-3 training for time-to-train on 1024 Intel Gaudi accelerators, the largest cluster result to be submitted by Intel to date, with TTT of 66.9 minutes, demonstrating strong Gaudi 2 scaling performance on ultra-large LLMs.

The benchmark also features a new training measurement: fine-tuning a Llama 2 model with 70B parameters. Fine-tuning LLMs is a common task for many customers and AI practitioners, making it a highlight relevant benchmark for everyday applications. Intel’s submission achieved a time-to-train of 78.1 minutes using 8 Intel Gaudi 2 accelerators. The submission leverages Zero-3 from DeepSpeed for optimizing memory efficiency and scaling during large model training, as well as Flash-Attention-2 to accelerate attention mechanisms.

The benchmark task force – led by the engineering teams from Intel’s Habana Labs and Hugging Face, who also serve as the benchmark owners – are responsible for the reference code and benchmark rules.

The Intel team looks forward to submitting MLPerf results based on the Intel Gaudi 3 AI accelerator in the upcoming inference benchmark. Announced in April, solutions based on Intel Gaudi 3 accelerators will be generally available from OEMs in fall 2024.

Juniper Networks

Juniper is thrilled to collaborate with MLCommons to accelerate AI innovation and make data center infrastructure simpler, faster and more economical to deploy. Training AI models is a massive, parallel processing problem dependent on robust networking solutions. AI workloads have unique characteristics and present new requirements for the network, but solving tough challenges such as these is what Juniper has been doing for over 25 years.

For MLPerf Training v4.0, Juniper submitted benchmarks for BERT, DLRM, and Llama2-70B with LoRA fine tuning on a Juniper AI Cluster consisting of Nvidia A100 and H100 GPUs using Juniper’s AI Optimized Ethernet fabric as the accelerator interconnect. For BERT, we optimized pre-training tasks using a Wikipedia dataset, evaluating performance with MLM accuracy. Our DLRM submission utilized the Criteo dataset and HugeCTR for efficient handling of sparse and dense features, with AUC as the evaluation metric, achieving exceptional performance. The Llama2-70B model was fine-tuned using LoRA techniques with DeepSpeed and Hugging Face Accelerate, optimizing gradient accumulation for balanced training speed and accuracy.

Most submissions were made on a multi-node setup, with PyTorch, DeepSpeed, and HugeCTR optimizations. Crucially, we optimized inter-node communication with RoCE v2, ensuring low-latency, high-bandwidth data transfers, which are critical for efficient, distributed training workloads.

Juniper is committed to an operations-first approach to help customers manage the entire data center lifecycle with market-leading capabilities in intent-based networking, AIOps and 800Gb Ethernet. Open technologies such as Ethernet and our Apstra data center fabric automation software eliminate vendor lock-in, take advantage of the industry ecosystem to push down costs and drive innovation, and enable common network operations across AI training, inference, storage and management networks. In addition, rigorously pre-tested, validated designs are critical to ensure that customers can deploy secure data center infrastructure on their own.

Lenovo

Leveraging MLPerf Training v4.0, Lenovo Drives AI Innovation

At Lenovo, we’re dedicated to empowering our customers with cutting-edge AI solutions that transform industries and improve lives. To achieve this vision, we invest in rigorous research and testing using the latest MLPerf Training v4.0 benchmarking tools.

Benchmarking Excellence: Collaborative Efforts Yield Industry-Leading Results

Through our strategic partnership with MLCommons, we’re able to demonstrate our AI solutions’ performance and capabilities quarterly, showcasing our commitment to innovation and customer satisfaction. Our collaborations with industry leaders like NVIDIA and AMD on critical AI applications such as image classification, medical image segmentation, speech-to-text, and natural language processing have enabled us to achieve outstanding results.

ThinkSystem SR685A v3 with 8x NVIDIA H100 (80Gb) GPUs and the SR675 v3 with 8x NVIDIA L40s GPUs: Delivering AI-Powered Solutions

We’re proud to have participated in these challenges using our ThinkSystem SR685A v3 with 8x NVIDIA H100 (80Gb) GPUs and the SR675 v3 with 8x NVIDIA L40s GPUs. These powerful systems enable us to develop and deploy AI-powered solutions that drive business outcomes and improve customer experiences.

Partnership for Growth: MLCommons Collaboration Enhances Product Development

Our partnership with MLCommons provides valuable insights into how our AI solutions compare against the competition, sets customer expectations, and enables us to continuously enhance our products. Through this collaboration, we can work closely with industry experts to drive growth and ultimately deliver better products for our customers, who remain our top priority.

NVIDIA

The NVIDIA accelerated computing platform showed exceptional performance in MLPerf Training v4.0. The NVIDIA Eos AI SuperPOD more than tripled performance on the LLM pretraining benchmark, based on GPT-3 175B, compared to NVIDIA submissions from a year ago. Featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, Eos achieved this through larger scale and extensive full-stack engineering. Additionally, NVIDIA’s 512 H100 GPU submissions are now 27% faster compared with just one year ago due to numerous optimizations to the NVIDIA software stack.

As enterprises seek to customize pretrained large language models, LLM fine-tuning is becoming a key industry workload. MLPerf added to this round the new LLM fine-tuning benchmark, which is based on the popular low-rank adaptation (LoRA) technique applied to Llama 2 70B. The NVIDIA platform excelled at this task, scaling from eight to 1,024 GPUs. And, in its MLPerf Training debut, the NVIDIA H200 Tensor Core GPU extended H100’s performance by 14%.

NVIDIA also accelerated Stable Diffusion v2 training performance by up to 80% at the same system scales submitted last round. These advances reflect numerous enhancements to the NVIDIA software stack. And, on the new graph neural network (GNN) test based on RGAT, the NVIDIA platform with H100 GPUs excelled at both small and large scales. H200 further accelerated single-node GNN training, delivering a 47% boost compared to H100.

Reflecting the breadth of the NVIDIA AI ecosystem, 10 NVIDIA partners submitted impressive results, including ASUSTek, Dell, Fujitsu, GigaComputing, HPE, Lenovo, Oracle, Quanta Cloud Technology, Supermicro, and Sustainable Metal Cloud.

MLCommons’ ongoing work to bring benchmarking best practices to AI computing is vital. Through enabling peer-reviewed, apples-to-apples comparisons of AI and HPC platforms, and keeping pace with the rapid change that characterizes AI computing, MLCommons provides companies everywhere with crucial data that can help guide important purchasing decisions.

Oracle

Oracle Cloud Infrastructure (OCI) offers AI Infrastructure, Generative AI, AI Services, ML Services, and AI in our Fusion Applications. Our AI infrastructure portfolio includes bare metal instances powered by NVIDIA H100, NVIDIA A100, and NVIDIA A10 GPUs. OCI also provides virtual machines powered by NVIDIA A10 GPUs. By mid-2024, we plan to add NVIDIA L40S GPU and NVIDIA GH200 Grace Hopper Superchip.

The MLPerf Training benchmark results for the high-end BM.GPU.H100.8 instance demonstrate that OCI provides high performance that at least matches that of other deployments for both on-premises and cloud infrastructure. These instances provide eight NVIDIA GPUs per node and the training performance is increased manifold due to RoCEv2 enabling efficient NCCL communications. The benchmarks were done on 1 Node, 8 Node and 16 Node clusters which correspond to 8, 64, 128 NVIDIA H100 GPUs and linear scaling was observed for the benchmarks as we scale from 1 node to 16 nodes. The GPUs are RAIL optimized. The GPU Nodes with H100 GPUs can be clustered using a high performance RDMA network for a cluster of tens of thousands of GPUs.

Quanta Cloud Technology

Quanta Cloud Technology (QCT), a global leader in data center solutions, excels in enabling HPC and AI workloads. In the latest MLPerf Training v4.0, QCT demonstrated its commitment to excellence by submitting two systems in the closed division. These submissions covered tasks in image classification, object detection, natural language processing, LLM, recommendation, image generation, and graph neural network. Both the QuantaGrid D54U-3U and QuantaGrid D74H-7U systems successfully met stringent quality targets.

The QuantaGrid D74H-7U is a dual Intel Xeon Scalable Processor server with eight-way GPUs, featuring the NVIDIA HGX H100 SXM5 module, supporting non-blocking GPUDirect RDMA and GPUDirect Storage. This makes it an ideal choice for compute-intensive AI training. Its innovative hardware design and software optimization ensure top-tier performance.

The QuantaGrid D54U-3U is a versatile 3U system that accommodates up to four dual-width or eight single-width accelerators, along with dual Intel Xeon Scalable Processors and 32 DIMM slots. This flexible architecture is tailored to optimize various AI/HPC applications. Configured with four NVIDIA H100-PCIe 80GB accelerators with NVLink bridge adapters, it achieved outstanding performance in this round.

QCT is committed to providing comprehensive hardware systems, solutions, and services to both academic and industrial users. We maintain transparency by openly sharing our MLPerf results with the public, covering both training and inference benchmarks.

Red Hat + Supermicro

Supermicro, builder of Large-Scale AI Data Center Infrastructure, and Red Hat Inc, the world’s leading provider of enterprise open source solutions, collaborated on this first ever MLPerf Training benchmark that included finetuning of LLM llama-2-70b using LoRA.

GPU A+ Server, the AS-4125GS-TNRT has flexible GPU support and configuration options: with active & passive GPUs, and dual-root or single-root configurations for up to 10 double-width, full-length GPUs. Furthermore, the dual-root configuration features directly attached eight GPUs without PLX switches to achieve the lowest latency possible and improve performance, which is hugely beneficial for demanding scenarios our customers face with AI and HPC workloads.

This submission demonstrates the delivery of performance, within the error bar of other submissions on similar hardware, while providing an exceptional Developer, User and DevOps experience.

Get access to a free 60 day trial of Red Hat OpenShift AI here.

Sustainable Metal Cloud (SMC)

Sustainable Metal Cloud, one of the newest members of ML Commons, is an AI GPU cloud developed by Singapore based Firmus Technologies using its proprietary single-phase immersion platform, named “Sustainable AI Factories”. Sustainable Metal Cloud’s operations are primarily based in Asia, with a globally expanding network of scaled GPU clusters and infrastructure – including NVIDIA H100 SXM accelerators.

Our first published MLPerf results demonstrate that when our customers train their models using our GPU cloud service, they access world-class performance with significantly reduced energy consumption. Our GPT-3 175B, 512 H100 GPU submission consumed only 468 kWh of total energy when connected with NVIDIA Quantum-2 Infiniband networking, demonstrating significant energy savings over conventional air-cooled infrastructure.

We are dedicated to advancing the agenda of energy efficiency in running and training AI. Our results, verified by MLCommons, highlight our commitment to this goal. We are very proud of our GPT3-175B total power result proving our solution scales and significantly reduces overall power use. The significant reduction in energy consumption is primarily due to the unique design of our Sustainable AI Factories.

With AI’s rapid growth, it’s crucial to address resource consumption by focusing opportunities to reduce energy usage in every facet of the AI Factory. Estimates place the energy requirements of new AI capable data centers at between 5-8GWh annually; potentially exceeding the US’s projected new power generation capacity of 3-5GWh per year.

As part of ML Commons, we aim to showcase progressive technologies, set benchmarks for best practices, and advocate for long-term energy-saving initiatives.

tiny corp

In the latest round of MLPerf Training v4.0 (closed division) benchmarks, tiny corp submitted benchmarks for ResNet50. We are proud to be the first to submit to MLPerf Training on AMD accelerators.

Our results show competitive performance between both AMD accelerators and NVIDIA accelerators, widening the choice for users to select the best accelerator, lowering the barrier to entry for high performance machine learning.

This was all achieved with tinygrad, a from scratch, backend agnostic, neural network library that simplifies neural networks down to a few basic operations, that can then be highly optimized for various hardware accelerators.

tiny corp will continue to push the envelope on machine learning performance, with a focus on democratizing access to high performance compute.

Bewakoof Teams Up with Google Cloud to Bring GenAI in Indian Fashion

Popular pop culture-based Indian clothing brand Bewakoof has announced a new collaboration with Google Cloud to design a collection of AI-generated t-shirts.

This partnership leverages Google Cloud’s expertise in generative AI and machine learning capabilities to create unique and creative designs. The collaboration involves using Google’s AI tools to analyse trends, customer preferences, and other data to generate t-shirt designs.

Google Cloud provides advanced generative AI capabilities through LLMs like Gemini, enabling businesses to innovate with new content in text, images, and code. Emphasising responsible AI development, the company ensures ethical and secure use of such AI.

Bewakoof belongs to the TMRW House of Brands, an Aditya Birla Group venture. TMRW has acquired a majority stake of 70-80% in Bewakoof.

https://www.instagram.com/reel/C7rF5XqPT01/?utm_source=ig_web_copy_link

What will this Partnership Bring to the Table?

This collaboration showcases how technology can be integrated into fashion, pushing the boundaries of traditional design methods and introducing a modern, tech-driven approach to creating apparel.

“We are excited to partner with Google to bring the power of GenAI to the hands of our consumers – enabling expression and personal connection.” said Prashanth Aluru, CEO at TMRW.

“Our generative AI solutions, especially use of the latest Imagen model, provides the ideal foundation for Bewakoof to bring its creative image generation tool to life. We’re excited to see the unique ways their customers will embrace this technology.” said Bikram Bedi, VP and Country MD at Google Cloud India.

Founded in 2012 by IIT Bombay graduates Prabhkiran Singh and Siddharth Munot, Bewakoof is an Indian e-commerce brand known for its trendy and affordable casual clothing.

Hitachi Vantara & AMD Partner To Develope High-Performance Hybrid Cloud and Database Solutions

Hitachi Vantara, the data storage, infrastructure, and hybrid cloud management subsidiary of Hitachi, l today announced the development of high-performance and energy-efficient hybrid cloud and database solutions powered by AMD EPYC processors.

The new solutions combine converged and hyperconverged solutions, including Hitachi Unified Compute Platform (UCP), with 4th Gen AMD EPYC processors, representing a significant advancement in hybrid cloud and database solutions designed to provide businesses with enhanced data center performance and efficiency.

Enterprises continue to face challenges in optimizing IT infrastructure and reducing costs. The complexity in managing traditional data centers often leads to inefficiencies and increased operational costs.

There is also a growing need to reduce power, cooling, and space requirements to streamline operations and enhance sustainability efforts.

Data centers across the world produce up to 3.7% of global greenhouse gas (GHG) emissions and use huge amounts of water for cooling, emitting the equivalent of 300 metric tons of C02 in 2020.

To address these challenges, businesses must seek partnerships offering high-performance, energy-efficient hybrid cloud and database solutions that align with goals of cost-effectiveness, simplicity, and sustainability.

The new Hitachi Vantara converged and hyperconverged solutions deliver high-performance, scale, and cost-reduction for hybrid cloud, database and high-performance environments.

The UCP portfolio also includes the Hitachi UCP for Azure Stack HCI, which helps deliver a consistent hybrid cloud infrastructure across edge, core, and public clouds.

Hitachi Vantara helps businesses simplify hybrid cloud deployments with a single source of systems, solutions, and services that streamline operations while reducing multi-vendor logistics.

FastAPI Tutorial: Build APIs with Python in Minutes

bala-fastapi
Image by Author

FastAPI is a popular web framework for building APIs with Python. It's super simple to learn and is loved by developers.

FastAPI leverages Python type hints and is based on Pydantic. This makes it simple to define data models and request/response schemas. The framework automatically validates request data against these schemas, reducing potential errors. It also natively supports asynchronous endpoints, making it easier to build performant APIs that can handle I/O-bound operations efficiently.

This tutorial will teach you how to build your first API with FastAPI. From setting up your development environment to building an API for a simple machine learning app, this tutorial takes you through all the steps: defining data models, API endpoints, handling requests, and more. By the end of this tutorial, you’ll have a good understanding of how to use FastAPI to build APIs quickly and efficiently. So let’s get started.

Step 1: Set Up the Environment

FastAPI requires Python 3.7 or later. So make sure you have a recent version of Python installed. In the project directory, create and activate a dedicated virtual environment for the project:

$ python3 -m venv v1  $ source v1/bin/activate  

The above command to activate the virtual environment works if you’re on Linux or MacOS. If you’re a Windows user, check the docs to create and activate virtual environments.

Next, install the required packages. You can install FastAPI and uvicorn using pip:

$ pip3 install fastapi uvicorn  

This installs FastAPI and all the required dependencies as well uvicorn, the server that we’ll use to run and test the API that we build. Because we’ll build a simple machine learning model using scikit-learn, install it in your project environment as well:

$ pip3 install scikit-learn

With the installations out of the way, we can get to coding! You can find the code on GitHub.

Step 2: Create a FastAPI App

Create a main.py file in the project directory. The first step is to create a FastAPI app instance like so:

# Create a FastAPI app  # Root endpoint returns the app description    from fastapi import FastAPI    app = FastAPI()  

The Iris dataset is one of the toy datasets that you work with when starting out with data science. It has 150 data records, 4 features, and a target label (species of Iris flowers). To keep things simple, let’s create an API to predict the Iris species.

In the coming steps, we’ll build a logistic regression model and create an API endpoint for prediction. After you’ve built the model and defined the /predict/ API endpoint, you should be able to make a POST request to the API with the input features and receive the predicted species as a response.

fastapi-1
Iris Prediction API | Image by Author

Just so it’s helpful, let's also define a root endpoint which returns the description of the app that we're building. To do so, we define the get_app_description function and create the root endpoint with the @app decorator like so:

# Define a function to return a description of the app  def get_app_description():  	return (      	"Welcome to the Iris Species Prediction API!"      	"This API allows you to predict the species of an iris flower based on its sepal and petal measurements."      	"Use the '/predict/' endpoint with a POST request to make predictions."      	"Example usage: POST to '/predict/' with JSON data containing sepal_length, sepal_width, petal_length, and petal_width."  	)    # Define the root endpoint to return the app description  @app.get("/")  async def root():  	return {"message": get_app_description()}  

Sending a GET request to the root endpoint returns the description.

Step 3: Build a Logistic Regression Classifier

So far we’ve instantiated a FastAPI app and have defined a root endpoint. It’s now time to do the following:

  • Build a machine learning model. We’ll use a logistic regression classifier. If you’d like to learn more about logistics regression, read Building Predictive Models: Logistic Regression in Python.
  • Define a prediction function that receives the input features and uses the machine learning model to make a prediction for the species (one of setosa, versicolor, and virginica).

fastapi-2
Logistic Regression Classifier | Image by Author

We build a simple logistic regression classifier from scikit-learn and define the predict_species function as shown:

# Build a logistic regression classifier  from sklearn.datasets import load_iris  from sklearn.linear_model import LogisticRegression    # Load the Iris dataset  iris = load_iris()  X, y = iris.data, iris.target    # Train a logistic regression model  model = LogisticRegression()  model.fit(X, y)    # Define a function to predict the species  def predict_species(sepal_length, sepal_width, petal_length, petal_width):  	features = [[sepal_length, sepal_width, petal_length, petal_width]]  	prediction = model.predict(features)  	return iris.target_names[prediction[0]]  

Step 4: Define Pydantic Model for Input Data

Next, we should model the data that we send in the POST request. Here the input features are the length and width of the sepals and petals—all floating point values. To model this, we create an IrisData class that inherits from the Pydantic BaseModel class like so:

# Define the Pydantic model for your input data  from pydantic import BaseModel    class IrisData(BaseModel):  	sepal_length: float  	sepal_width: float  	petal_length: float  	petal_width: float  

If you need a quick tutorial on using Pydantic for data modeling and validation, read Pydantic Tutorial: Data Validation in Python Made Super Simple.

Step 5: Create an API Endpoint

Now that we’ve built the classifier and have defined the predict_species function ready, we can create the API endpoint for prediction. Like earlier, we can use the @app decorator to define the /predict/ endpoint that accepts a POST request and returns the predicted species:

# Create API endpoint  @app.post("/predict/")  async def predict_species_api(iris_data: IrisData):  	species = predict_species(iris_data.sepal_length, iris_data.sepal_width, iris_data.petal_length, iris_data.petal_width)  	return {"species": species}  

And it’s time to run the app!

Step 6: Run the App

You can run the app with the following command:

$ uvicorn main:app --reload

Here main is the name of the module and app is the FastAPI instance. The --reload flag ensures that the app reloads if there are any changes in the source code.

Upon running the command, you should see similar INFO messages:

INFO: 	Will watch for changes in these directories: ['/home/balapriya/fastapi-tutorial']  INFO: 	Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)  INFO: 	Started reloader process [11243] using WatchFiles  INFO: 	Started server process [11245]  INFO: 	Waiting for application startup.  INFO: 	Application startup complete.  …  …  

If you navigate to "http://127.0.0.1:8000"(localhost), you should see the app description:

fastapi-3
App Running on localhost

Step 7: Test the API

You can now send POST requests to the /predict/ endpoint with the sepal and petal measurements—with valid values—and get the predicted species. You can use a command-line utility like cURL. Here’s an example:

curl -X 'POST'     'http://localhost:8000/predict/'     -H 'Content-Type: application/json'     -d '{    "sepal_length": 5.1,    "sepal_width": 3.5,    "petal_length": 1.4,    "petal_width": 0.2  }'  

For this example request this is the expected output:

{"species":"setosa"}

Wrapping Up

In this tutorial, we went over building an API with FastAPI for a simple classification model. We went through modeling the input data to be used in the requests, defining API endpoints, running the app, and querying the API.

As an exercise, take an existing machine learning model and build an API on top of it using FastAPI. Happy coding!

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

More On This Topic

  • Build a Web Scraper with Python in 5 Minutes
  • Build a Text-to-Speech Converter with Python in 5 Minutes
  • Build a Machine Learning Web App in 5 Minutes
  • Build AI Chatbot in 5 Minutes with Hugging Face and Gradio
  • KDnuggets News March 9, 2022: Build a Machine Learning Web App in 5…
  • New ChatGPT and Whisper APIs from OpenAI

TransUnion Bolsters Global Team with 55% Indian Tech Talent 

In 2018, Chicago-based consumer credit reporting agency TransUnion, which has been in business for over five decades, opened its first global capability centre (GCC) in Chennai. Since then, the company has expanded its footprint in other Indian cities including Bengaluru, Pune, and Hyderabad.

“The success of our Chennai centre paved the way for us to expand our GCC network to six centres across three continents,” Debashish Panda, SVP and head, TransUnion GCCs (India, South Africa and Costa Rica), told AIM in a recent interaction.

India GCC is the largest, making up over a quarter of the company’s workforce.

The India centres house 55% of TransUnion’s technology talent, 56% of operations personnel, and 39% of analytics experts.

This network, spanning India, South Africa, and Costa Rica, now employs over 4,000 associates and has become integral to TransUnion’s global operations.

“These centres in India leverage local talent to support and enhance TransUnion’s core capabilities, including technology support, data analytics, business process management, and contact centre operations,” Panda explained.

This strategic location allows TransUnion to expand its time zone and language coverage, facilitating global operations and contributing to economic growth in the regions it operates in.

India Team – Global Hub

India is the hub of TransUnion’s strategy, contributing to its global operations. The India GCCs operate as a microcosm of the enterprise, embedding almost every global function within their operations.

These centres provide specialist capabilities across various domains, including product and platform solutions, data science and analytics, system architecture, intelligent automation, and business process management.

“The GCC India plays a pivotal role in TransUnion’s mission to migrate products and services to the cloud, ultimately transforming our operations by enabling streamlined product delivery and faster innovation,” said Panda.

This shift has streamlined product delivery and faster innovation. The India team’s contributions have been instrumental in several modernisation initiatives.

Recently, the team’s work on the OneTru platform, which leverages TransUnion’s data assets, cloud infrastructure, and AI capabilities, has significantly enhanced the company’s ability to deliver comprehensive and compliant consumer insights.

Another achievement of the team is revamping TransUnion’s solution enablement platform. This platform integrates separate data and analytic assets designed for credit risk, marketing, and fraud prevention into a unified environment.

The India team supports internal infrastructure and development platforms. It allows consistent and secure development, deployment, and management of enterprise applications in a hybrid cloud environment.

Leveraging Talent and Expertise

“India’s strong digital skills and infrastructure support a compelling growth and expansion opportunity for us,” explained Panda.

The Indian talent offers a full stack of capabilities, including technology solutions, data science, and business process management. “This expertise allows TransUnion to provide leading-edge capabilities to our clients and colleagues worldwide,” the spokesperson said.

However, due to a limited talent pool and intense competition, the company faces challenges in hiring and retaining top talent in specialised areas like AI and ML​​.

To address this, it has implemented a unique talent-focused operating model and a compelling Employee Value Proposition. This includes specialised training programs and a culture of continuous learning. The company also offers programs that support job mobility within the organisation, aiding career growth.

“Our unique operating model and continuous learning culture help us attract and retain top talent,” said Panda.

It also ensures a competitive advantage in attracting and retaining talent through several initiatives. For example, the ‘University Graduate Program’ develops the employment readiness of graduates, while the ‘TU Connect’ program offers comprehensive learning and problem-solving initiatives.

Employee engagement, health and wellness programs, and CSR initiatives enhance employee satisfaction and retention.

Additionally, a transparent internal career mobility framework promotes internal career opportunities, enabling 25% of associates to advance internally each year. These efforts are supported by a strong Employee Value Proposition, competitive benefits, and a flexible work environment.

Similarly, another core principle of the GCC in India is a focus on diversity, equity, inclusion, and belonging (DEIB). “Our focused efforts have resulted in a 10% increase in diversity ratio since 2018, with our current gender diversity ratio at 31%,” Panda highlighted.

AIM Media House is hosting its flagship GCC summit, MachineCon, in Bengaluru, on June 28, 2024. The event will feature over 100 GCC leaders, including Balaji Narasimhan, head of operations and GCC site leader at TransUnion.

Don’t miss this opportunity—get your passes today!

Larry Ellison Sees a Surge in Net Worth, Thanks to Google Cloud, OpenAI and Others 

Larry Ellison is finally smiling. It took Google Cloud and OpenAI nearly nine months to realise the importance of Oracle Cloud Services(OCI). AWS, hopefully, will follow suit.

The outcome: Oracle chief Ellison saw almost $19 billion in wealth as the company he founded in 1977 forecasted double-digit revenue growth for the fiscal year.

Moreover, following these announcements, the software company’s shares skyrocketed by 13% in extended trading on Wednesday.

Oracle Cloud Services reported revenue of $10.2 billion in Q4 2024. Meanwhile, Microsoft’s Intelligent Cloud posted $26.7 billion in sales for the recent quarter, AWS reached $25 billion, and Google Cloud reported $9.6 billion.

OpenAI will now run its workloads on OCI, extending the Microsoft Azure AI platform to Oracle’s cloud services.

“Like many others, OpenAI chose OCI because it is the world’s fastest and most cost-effective AI infrastructure,” said Oracle chief Safra Catz in a recent earnings call. She added that Oracle has signed over 30 AI contracts totalling over $12 billion this quarter and nearly $17 billion this year.

Meanwhile, Elon Musk’s xAI is discussing with Oracle executives the possibility of spending $10 billion over the next few years renting cloud servers.

OpenAI, in its recent post on X, clarified that “the partnership with OCI enables OpenAI to use the Azure AI platform on OCI infrastructure for inference and other needs.” However, all pre-training of frontier models will continue to happen on supercomputers built in partnership with Microsoft.

At the backdrop of the Data + AI Summit, Databricks lauded Oracle.

“We’ve seen Oracle become much more relevant in the cloud space in this AI era. We actually have partnerships with them around GPUs already, so many of the models we’ve trained on Mosaic AI, custom models that we trained, have been trained on infrastructure provided by Oracle,” said the Databricks chief Ali Ghodsi, hinting at a plausible partnership in the coming months, pointing at customer requirements.

“Congrats to Ellison. He needs it,” said Ghodsi.

The King of Multi-Cloud or NVIDIA of the Cloud in the Making

Oracle announced Oracle Database@Azure last year, which delivers Oracle database services running on OCI inside Azure data centres and gives customers more flexibility in where they run their workloads.

Ellison said that customers have already been using multi-cloud products and services, and there are even stronger reasons to believe they should be interoperable and interconnected more than ever.

“We’re doing the same thing with Google. We would love to do the same thing with AWS. We think we should be interconnected to everybody, and that’s what we’re attempting to do in our multi-cloud strategy,” said Ellison.

Expanding on its multi-cloud strategy, Oracle recently partnered with Google Cloud, giving customers the choice to combine OCI and Google Cloud to help accelerate their application migrations and modernisation.

“OCI and Google Cloud network interconnect is available immediately in 10 regions, and we will be live with Oracle Database at Google Cloud in September, where customers can get direct access to Oracle Database services running on OCI deployed in Google Cloud data centres,” said Ellison.

Pradeep Vincent, chief technical architect of Oracle, in an exclusive interview, told AIM that OCI is pretty different from the competitors out there. “Our goal is to make it easy for customers to use multiple clouds, period,” he said, explaining that a key part of this is their ‘distributed cloud strategy,’ putting the cloud where customers want it.

On similar lines, Ellison also said, “We believe in giving customers a choice, and they want it. Customers are using multiple clouds, including infrastructure clouds and applications like Salesforce and Workday. Therefore, we think it’s very important for all these clouds to become interconnected”.

In the coming months, Ellison mentioned that Oracle looks to get rid of these fees (or egress cost) for moving data from cloud to cloud, and all the clouds will be interconnected and customers can pick their favorite service from their favorite cloud and mix and match whatever they want to use and do it easily and seamlessly.

Further, he said that OCI’s RDMA network moves data much faster. “And when you charge by the minute, faster also means less expensive,” he said, adding that OCI trains large language models several times faster and at a fraction of the cost of other clouds.

Understanding Data Privacy in the Age of AI

Understanding Data Privacy in the Age of AI
Image by Author | Midjourney & Canva

The discussions on the ethical and responsible development of AI have gained significant traction in recent years and rightly so. Such discussions aim to address myriad risks, involving bias, misinformation, fairness, etc.

While some of these challenges are not entirely new, the surge in demand for AI applications has certainly amplified them. Data privacy, a persistent issue, has gained increased importance with the emergence of Generative AI.

This statement from Halsey Burgund, a fellow in the MIT Open Documentary Lab, highlights the intensity of the situation. – “One should think of everything one puts out on the internet freely as potential training data for somebody to do something with.”

Changing times call for changing measures. So, let’s understand the repercussions and gain cognizance of handling the risks stemming from data privacy.

Time to Raise the Guards

Every company that is handling user data, be it in the form of collecting and storing data, performing data manipulation and processing it to build models, etc. must take care of varied data aspects, such as:

  • Where is data coming from and where is it going?
  • How is it manipulated?
  • Who is using it and how?

In short, it is crucial to note how and with whom data is exchanged.

Every user who is sharing their data and giving consent to use it must watch out for the information they are comfortable sharing. For example, one needs to be comfortable sharing data, if they ought to receive personalized recommendations.

GDPR is the Gold Standard!!!

Managing the data becomes high stakes, when it concerns the PII i.e. Personal Identifiable Information. As per the US Department of Labour, it largely includes information that directly identifies an individual, such as name, address, any identifying number or code, telephone number, email address, etc. A more nuanced definition and guidance on PII is available here.

To safeguard individuals' data, the European Union enacted the General Data Protection Regulation (GDPR), setting strict accountability standards for companies that store and collect data on EU citizens.

Development Is Faster Than Regulation

It is empirically evident that the rate of development on any technological innovation and breakthrough is ay faster than the rate at which the authorities can foresee its concerns and govern it timely.

So, what would one do till regulation catches up with the fast-paced developments? Let’s find out.

Self-regulation

One way to address this gap is to build internal governance measures, much like corporate governance and data governance. It is equivalent to owning up your models to the best of your knowledge clubbed with the known industry standards and best practices.

Such measures of self-regulation are a very strong indicator of holding high standards of integrity and customer-centricity, which can become a differentiator in this highly competitive world. Organizations adopting the charter of self-regulation can wear it as a badge of honor and gain customers’ trust and loyalty – which is a big feat, given the low switch costs for the users among the plethora of options floating around.

One aspect of building internal AI governance measures is that it keeps the organizations on the path of a responsible AI framework, so they are prepared for easy adoption when the legal regulations are put in place.

Rules must be the same for everyone

Setting the precedence is good, theoretically. Technically speaking, no one organization is fully capable of foreseeing it all and safeguarding themselves.

Another argument that goes against self-regulation is that everyone should be adhering to the same rules. No one would wish to self-sabotage their growth in anticipation of upcoming regulation by over-regulating themselves, hindering their business growth.

The Other Side of Privacy

Many actors can play their role in upholding high privacy standards, such as organizations and their employees. However, the users have an equally important role to play – it is time to raise your guard and develop a lens of awareness. Let’s discuss them in detail below:

Role of organizations and employees

The organizations have created a responsibility framework to sensitize their teams and create awareness of the right ways to prompt the model. For sectors like healthcare and finance, any sensitive information shared through input prompts is also a form of breach of privacy – this time unknowingly but through the employees and not from the model developers.

Role of users

Essentially, privacy can not be a question, if we are feeding such data into such models ourselves.

Role of users in privacy
Image by Author

Most of the foundational models (similar to the example shown in the image above) highlight that the chat history might be used to improve the model, hence the users must thoroughly check the settings control to allow the appropriate access to promote their data privacy.

Scale of AI

Users must visit and modify the consent control across each browser per device to stop such breaches. However, now think of large models that are scanning such data through almost all of the internet, primarily including everybody.

That scale becomes a problem!!!

Precisely the reason for which large language models get advantaged by having access to training data of several orders of magnitude higher than traditional models, that same scale creates massive issues raising privacy concerns too.

Deepfakes – A Disguised Form of Privacy Breach

Recently, an incident surfaced where a company executive directed its employee to make a multi-million dollar transaction to a certain account. Following the skepticism, the employee suggested arranging a call to discuss this, after which he made the transaction – only to know later that everyone on the call was deepfakes.

For the unversed, the Government Accountability Office explains it as – “a video, photo, or audio recording that seems real but has been manipulated with AI. The underlying technology can replace faces, manipulate facial expressions, synthesize faces, and synthesize speech. Deepfakes can depict someone appearing to say or do something that they never said or did.”

Thinking rhetorically, deepfakes are also a form of privacy breach, that is equivalent to identity theft, where the bad actors are pretending to be someone they are not.

With such stolen identities, they can drive decisions and actions, that would otherwise not have taken place.

This serves as a crucial reminder for us that bad actors aka attackers are often way ahead of good actors, who are on defense. Good actors are still scrambling their way around to damage control first, as well as ensure robust measures to prevent future mishaps.

Vidhi Chugh is an AI strategist and a digital transformation leader working at the intersection of product, sciences, and engineering to build scalable machine learning systems. She is an award-winning innovation leader, an author, and an international speaker. She is on a mission to democratize machine learning and break the jargon for everyone to be a part of this transformation.

More On This Topic

  • With Data Privacy learn to implement technical privacy solutions…
  • Are Data Scientists Still Needed in the Age of Generative AI?
  • 10 Hurdles of Building a Deep Tech Startup in the Age of ChatGPT
  • Celebrating Awareness of the Importance of Data Privacy
  • A Comprehensive Survey on Trustworthy Graph Neural Networks:…
  • 3 Ways Understanding Bayes Theorem Will Improve Your Data Science

Luma AI Unveils Dream Machine to Ramp Up Competition Against Sora & Kling

San Francisco-based AI startup Luma AI today announced the release of Dream Machine, a new AI system capable of generating high-quality videos from simple text descriptions. This technology opens the door for a wide range of creators and companies to produce original video content at unprecedented speed.

This AI generator allows users to enter a descriptive prompt and within a few minutes, generates a realistic clip video.

Check out the model here.

The @LumaLabsAI release of #LumaDreamMachine is a game changer for my cinematic AI work. It levels up the AI video quality of movement and facial consistency far beyond anything currently available.
I was lucky enough to get pre-launch access. Here are some of my experiments: pic.twitter.com/NzWTBwtjQe

— Chikai (@lifeofc) June 12, 2024

Dream Machine Vs Rest

While other systems like OpenAI’s Sora and Kuaishou’s Kling have showcased impressive capabilities, they remain accessible only to a selected group of users. In contrast, Luma has made its AI model available for anyone to experiment with for free on its website, representing a major milestone in AI-powered video generation.

This model’s launch comes amid multiple activities in the generative AI space, as startups and tech giants race to develop increasingly new tools for making realistic images, audio and video from text inputs.

This open approach could give Luma a head start in building a community of creators and developers around its platform. By lowering its barriers to entry, it has the potential to spark a wave of innovation and creativity as users explore the possibilities of AI- generated video.

Additionally, unlike Sora or Kling, Dream Machine claims to be faster, making it efficient for experimenting with different prompts and ideas. While Sora’s videos have a more “dreamy” and film quality, Dream Machine’s output tends to be more photo-realistic, which could make it more suitable for certain use cases.

Currently, Dream Machine’s videos are limited to five seconds, while Kling can generate videos up to two minutes long.

Safe to say, Luma AI’s new model stands out for its open accessibility, impressive capabilities in generating realistic, cinematic videos, and fast generation speed, making it an appealing tool for creators looking to quickly visualize ideas or produce short, high-quality video content.

More about Luma AI

Previously, the company came up with Genie. Founded in 2021, Cofounder and CEO Amit Jain is the brain behind the startup.

“With Genie, for the first time, creating 3D things at scale has become possible with AI, and that’s grown to 100,000 users in just four weeks. But we want to build vastly more capable, intelligent, and useful visual models for our users.” said Amit Jain, Cofounder and CEO.

With funding from eight investors, Andreessen Horowitz and NVIDIA are the most recent participants in the funding round. It had received over $70 million, notably $43 million from its Series B round.

50 Best Firms for Data Scientists to Work for 2024

This report marks the sixth annual edition of AIM’s “50 Best Firms for Data Scientists to Work For”.

The report evaluates companies based on the employee-centricity of their policies. Building on last year’s expanded scope, which included a broader range of firms with data science teams, AIM has continued this initiative. We surveyed hundreds of employers across India to gather insights into how they cultivate exemplary work environments for data scientists.

For previous year’s rankings: 2021 | 2022 | 2023

Top Trends:

  1. Emphasis on Upskilling and Mentorship: 100% of leading firms provide upskilling opportunities and mentorship programs, underscoring the industry’s commitment to continuous learning and professional growth for data scientists.
  2. High Levels of Productivity and Engagement: Top firms boast an average attrition rate of just 9.3% and report a 44.3% growth in analytics capabilities, reflecting strong employee engagement and productivity. Continued investment in talent is expected to enhance these metrics further.
  3. Prioritization of Benefits and Well-being: 90% of leading firms offer flexible work models and prioritize both physical and mental well-being, highlighting the importance of work-life balance in driving productivity and job satisfaction.
  4. Focus on Recognition and Rewarding Excellence: 60% of top firms have robust recognition programs and competitive compensation structures. As organizations seek to retain top talent, the implementation of comprehensive reward systems is anticipated to increase.
  5. Advancing Diversity and Inclusion:60% of leading firms have women in leadership roles, and 90% boast significant representation of women in data science teams. With 80% of firms having DE&I initiatives, the emphasis on fostering inclusive and equitable workplaces is expected to grow.
  6. Encouragement of Cross-Functional Collaboration: 40% of top firms promote cross-functional collaboration, recognizing its role in driving innovation and problem-solving. More firms are expected to adopt collaborative practices as data science projects become increasingly complex.
  7. Robust Work Flexibility and Leave Policies: 90% of top firms offer flexible work arrangements, and 80% have comprehensive leave policies. This trend highlights the growing importance of work-life balance in attracting and retaining skilled data scientists.
  8. Investment in Learning Platforms and Engagement Initiatives: 50% of leading firms provide access to advanced learning platforms, and 60% have engagement initiatives in place. As the need for continuous learning grows, more firms are likely to invest in platforms that support ongoing professional development.

Rankings 2024

Every company that responded has developed workplace policies or started projects to create a supportive environment at work and supply all the tools required for data scientists to be happy and productive in their roles. A few businesses performed better than others.

Unveiling the Top 50 Companies:

We present a detailed review of our analysis results, highlighting the leading companies across various sub-indices and the overall index. This section emphasizes the standout policies and initiatives that have earned these companies a place in the top 50.

Read the full report below:

Best Firm Certification is “Gold Standard” In Identifying & Recognizing Great Data Science Workplaces. Know more here.

I tried Google’s new AI alphabet generator, and it’s way more fun than it sounds

GenType Alphabet Creator

Do you remember the joy of doodling or tinkering with (the now vintage) Microsoft Paint or WordPad tools when you were younger? Google's new artificial intelligence (AI) experiment granted me nearly the same experience 20 years later.

Also: Stability AI launches its 'most sophisticated' image generator yet

If you go to Google Labs, you will spot a new addition to its experiments — GenType Alphabet Creator. With the tool, you can use Imagen 2 to generate AI images for all 26 alphabet letters from a single prompt.

The genesis for this experiment was when a Google employee wanted to use Imagen to help his children learn the alphabet visually by generating letters from familiar objects. Even though I can see the value in using the tool for children's learning, as an adult without children, I found a different use — entertainment.

This tool is fun because you can make endless words and sentences from one prompt. You can go further down the rabbit hole and create more generations with new prompts.

All the letters generated have slight differences. Imagen 2 runs each letter's generation separately and tweaks the design according to your prompt. Still not getting the hype? Let me show you the tool in action.

First, visit the Google Labs page for GenType and sign in to your Google account. You can now start tinkering by typing words into the textbox on the left that describe what you want your alphabet letters to look like.

Also: The best AI image generators to try right now

To get the best results, Google suggests including three different components: what you want to see in the generation's foreground (the letter), the background (the backdrop), and the style (aesthetic).

I typed "Seashells on an ocean background, aerial photo" for the example, below. In about a minute, the tool generated the full 26 letters.

If you don't like some of the results, click on a letter and tap the regenerate button.

Also: Forget DALL-E: Apple's new AI image generator runs on-device and works like magic

Once you are happy with the design of the letters, you can start typing in the text box. As you do, the graphics will immediately populate — arguably the most satisfying part of the fun. Once you create something you like, you can save it as a PNG. You can see my results below.

The quality of the images is impressive. You can expand the PNG image above to see the details. Once you generate your first alphabet, you can create as many as you'd like — as with any other image generator. Quick warning: the tool is so satisfying that you may be glued to it for hours. Happy tinkering.

Artificial Intelligence