AWS and Nvidia Talk 65 Exaflop ‘Ultra-Cluster’ at re:Invent November 29, 2023 by Alex Woodie
AWS yesterday unveiled new EC2 instances geared toward tackling some of the fastest growing workloads, including AI training and big data analytics. During his re:Invent keynote, CEO Adam Selipsky also welcomed Nvidia founder Jensen Huang onto the stage to discuss the latest in GPU computing, including the forthcoming 65 exaflop “ultra-cluster.”
Selipsky unveiled Graviton4, the fourth-generation of the efficient 64-bit ARM processor that AWS first launched in 2018 for general-purpose workloads, such as database serving and running Java applications.
According to AWS, Graviton4 offers 2 MB of L2 cache per core, for a total of 192 MB, and 12 DDR5-5600 memory channels. All told, the new chip offers 50% more cores and 75% more memory bandwidth than Graviton3, driving 40% better price-performance for database workloads and a 45% improvement for Java, AWS says. You can read more about the Graviton4 chip on this AWS blog.
“We were the first to develop and offer our own server processors,” Selipsky said. “We’re now on our fourth generation in just five years. Other cloud providers have not even delivered on their first server processors.”
AWS CEO Adam Selipsky (left) talks with Nvidia CEO Jensen Huang at re:Invent 2023
AWS also launched R8G, the first EC2 (Elastic Compute Cluster) instances based on Graviton4, adding to the 150-plus Graviton-based instances already in the barn for the cloud big.
“R8G are part of our memory-optimized instance family, design to deliver fast performance for workloads that process large datasets in memory, like database or real time big data analytics,” Selipsky said. “R8G instances provide the best price-performance energy efficiency for memory-intensive workloads, and there are many, many more Graviton instances coming.”
The launch of ChatGPT 364 days ago kicked off a Gold Rush mentality to train and deploy large language models (LLMs) in support of Generative AI applications. That’s pure gold for cloud providers like AWS, which are more than happy to supply the enormous amounts of compute and storage required.
AWS also has a chip for that, dubbed Trainium. And yesterday at re:Invent, AWS unveiled the second generation of its Trainum offering. When the Trainium2-based EC2 instances come online in 2024, they will deliver more bang for GenAI developer bucks.
“Trainium2 is designed to deliver four times faster performance compared to first generation chips, and makes it ideal for training foundation models with hundreds of billions or even trillions of parameters,” he said. “Trainium2 is going to power the next generation of the EC2 ultra-cluster that will deliver up to 65 exaflops of aggregate compute.”
AWS Chief Evangelist Jeff Barr shows off the Graviton4 chip (Image courtesy AWS)
Speaking of ultra-clusters, AWS continues to work with Nvidia to bring its latest GPUs into the AWS cloud. During his conversation on stage with Nvidia CEO Huang, re:Invent attendees got a teaser about the ultra-cluster coming down the pike.
All of the attention was on the Grace Hopper superchip, or the GH200, which pairs two GH100 chips together with the NVLink chip-to-chip interconnect. Nvidia is also working on an NVLink switch that allows up to 32 Grace Hopper superchips to be connected together. When paired with AWS Nitro and Elastic Fabric Adapter (EFA) networking technology, it enables the aforementioned ultra-cluster.
“With AWS Intro, that becomes basically one giant virtual GPU instance,” Huang said. “You’ve got to imagine, you’ve got 32 H200s, incredible horsepower, in one virtual instance because of AWS Nitro. Then we connect with AWS EFA, your incredibly fast networking. All of these units now can lead into an ultra-cluster, an AWS ultra-cluster. I can’t wait until all this come together.”
“How customers are going to use this stuff, I can only imagine,” Selipsky responded. “I know the GH200s are really going to supercharge what customers are doing. It’s going to be available–of course EC2 instances are coming soon.”
The coming H200 supercluster will sport 16,000 GPUs and offer 65 exaflops of computing power, or “one giant AI supercomputer,” Huang said.
“This is utterly incredible. We’re going to be able to reduce the training time of the largest language models, the next generation MoE, these extremely large mixture of experts models,” he continued. “I can’t wait for us to stand this up. Our AI researchers are champing at the bit.”
This article first appeared in Datanami.
About the author: Alex Woodie
Alex Woodie has written about IT as a technology journalist for more than a decade. He brings extensive experience from the IBM midrange marketplace, including topics such as servers, ERP applications, programming, databases, security, high availability, storage, business intelligence, cloud, and mobile enablement. He resides in the San Diego area.