“Today it’s all about AI,” said Lisa Su, the CEO of AMD, at the Advancing AI event. “AI is not just a cool new thing, it is actually the future of computing…the only thing close is maybe just the introduction of the internet, but what’s different about AI is that the adoption rate is much much faster.”
She highlighted how AMD is positioned in the perfect spot to power the entire chain of the AI era. “Thinking about massive cloud server installations on-prem enterprise clusters to the next generation of AI embedded on PCs,” Su explains about how the strategy of AMD is focused on developing compute engines, open software capabilities, and fostering an AI ecosystem with deep co-innovation.
“The capability and availability of GPUs is the single most important driver of AI adoption,” said Su, to which the crowd agreed. For this, AMD has released Instinct MI300X accelerators, boasting an industry leading bandwidth of generative AI, along with Instinct MI300A accelerated processing unit (APU), combined with the latest AMD CDNA 3 architecture and Zen 4 CPUs – all focused for HPC and AI workloads.
All about collaboration
That apart, Su highlighted how these GPUs wouldn’t be valuable if there was not an ecosystem that could utilise this, and AMD has just built that.
AMD believes that AI is a collaborative frontier, and not just a competition. The keynote by Su featured Microsoft’s CTO Kevin Scott, Oracle’s senior vice president Karan Batta, and Meta AI senior director engineering Ajit Matthews, and founders of AMD customers such as Lamini, Databricks, and Essential AI.
For instance, Scott highlights that Microsoft has been working with AMD for Epic, Xbox, and a lot of AI computers all this while. “The thing that allowed Microsoft and OpenAI to do this [ChatGPT] was the amount of infrastructure work that we have been invested in all this while,” highlighting how AMD has been a constant contributor to the success of the Microsoft-OpenAI partnership.
Su highlighted how Microsoft has been key in making AMD advance in its AI journey. “We are super excited about the MI300X, and at Ignite we announced that MI300X VM’s would be available on Azure,” said Scott. Bringing up GPT-4 and Llama 2 on MI300X and seeing the performance, and rolling it into production is something Scott said that he has been waiting for eagerly.
Matthews from Meta AI also highlighted that Meta is going to include AMD MI300X for building its data centres for AI inference workloads. He said MI300X is trained to be the fastest design-to-deployment solution in Meta’s history.
Batta comes to the stage and highlights how OCI has been the leading customer of AMD. Now, Oracle is going to support MI300X as a bare metal stack on its server for giving its customers the option of using AMD GPUs for training and inference. “Customers are already seeing incredible results with the previous generation of GPUs, and the next generation is going to make it even better.”
Ion Stoica, co-founder of Databricks; Ashish Vaswani, co-founder of Essential AI; and Sharon Zhou, co-founder of Lamini, discussed how they have been leveraging AMD hardware and software all this while, and proving that the open nature of the technology has been helping them fully own the technology. “We have reached beyond CUDA,” said Zhou.
Winning Race via Open Source
To make MI300X easier to adopt in the industry, AMD has built the Instinct platform based on the industry standard OCP compliant design. It means that the board on which the MI300X is built upon can be integrated directly into any other OCP compliant designed platform. “You can take out your other board, and put in the MI300X Instinct platform,” Su highlighted, comparing it with the NVIDIA H100 HGX.
The training performance of the MI300X is exactly equal to the NVIDIA H100. But when it comes to inference, MI300X using Bloom 176B and Llama 2 70B offers 1.6X and 1.4X faster performance.
The Instinct platform can train and infer twice as many models as its competitors when running multiple different models with its expanded 2.4X memory. Moreover, it can also run two times larger LLMs on a single platform. “If you don’t have enough GPUs, this is really really helpful,” said Su.
“As important as the hardware is, software is what really drives innovation,” Su added, talking about the new release of ROCm 6.
Victor Peng, president of AMD, highlighted how ROCm software stack has been production-ready since last year, taking examples of Databricks, Lamini, and Essential AI. AMD’s CUDA alternative is open source, and other AI software such as ZenDNN and Vitis AI also support the whole AI ecosystem. “Any model can run seamlessly across the ecosystem of software.”
“We wanted ROCm to be modular and open source for broad user accessibility and rapid contribution from the open source AI community,” Peng said, adding that it is the software strategy highlighting how CUDA is proprietary and closed source. Furthermore, ROCm is now also supported on Radeon GPUs, along with Ryzen 1.0 software, for making AI on edge.
This clearly marks that AMD’s open AI approach is advancing the company’s hardware and software to the whole community, proving that it is more than just building GPUs.
The post Lisa Su Takes AMD’s AI Beyond Just GPUs appeared first on Analytics India Magazine.