It’s no secret anymore that AI is GPU hungry—and OpenAI’s Sam Altman retains stressing simply how urgently they want extra. “Working as quick as we are able to to actually get stuff buzzing; if anybody has GPU capability in 100k chunks we are able to get ASAP, please name,” he posted on X not too long ago. The demand surged even additional when customers flooded ChatGPT with Ghibli-style picture requests, prompting Altman to ask individuals to decelerate.
That is the place Google holds a definite benefit. Not like OpenAI, it isn’t totally depending on third-party {hardware} suppliers. At Google Cloud Subsequent 2025, the corporate unveiled Ironwood, its seventh-generation tensor processing unit (TPU), designed particularly for inference. It’s a key a part of Google’s broader AI Hypercomputer structure.
“Ironwood is our strongest, succesful and energy-efficient TPU but. And it’s purpose-built to energy considering, inferential AI fashions at scale,” Google stated. The tech large stated that right this moment, we dwell within the “age of inference”, the place AI brokers actively search, interpret, and generate insights as an alternative of simply responding with uncooked information.
The corporate additional stated that Ironwood is constructed to handle the complicated computation and communication calls for of considering fashions, comparable to giant language fashions and mixture-of-experts techniques. It added that with Ironwood, clients now not have to decide on between compute scale and efficiency.
Ironwood will probably be accessible to Google Cloud clients later this 12 months, the tech large stated.It at present helps superior fashions, together with Gemini 2.5 Professional and AlphaFold. The corporate additionally not too long ago introduced that the Deep Analysis function within the Gemini app is now powered by Gemini 2.5 Professional.
Google said that over 60% of funded generative AI startups and practically 90% of generative AI unicorns (startups valued at $1 billion or extra) are Google Cloud clients. In 2024, Apple revealed it used 8,192 TPU v4 chips in Google Cloud to coach its ‘Apple Basis Mannequin’, a big language mannequin powering its AI initiatives. This was one of many first high-profile adoptions exterior Google’s ecosystem.
Ironwood is particularly optimised to cut back information motion and on-chip latency throughout large-scale tensor operations. As the size of those fashions exceeds the capability of a single chip, Ironwood TPUs are geared up with a low-latency, high-bandwidth Interconnect (ICI) community, enabling tightly coordinated, synchronous communication throughout your complete TPU pod.
The TPU helps two configurations, one with 256 chips and one other with 9,216 chips. The total-scale model delivers 42.5 exaflops of compute, over 24 occasions the efficiency of the El Capitan supercomputer, which affords 1.7 exaflops per pod. Every Ironwood chip offers 4,614 TFLOPs of peak compute.
In accordance with Google, Ironwood is almost twice as power-efficient as Trillium and nearly 30 occasions extra environment friendly than its first Cloud TPU launched in 2018. Liquid cooling allows constant efficiency below sustained load, addressing the power constraints related to large-scale AI.
Why Google Loves TPUs?
It’s unlucky that Google doesn’t supply TPUs as a standalone product. “Google ought to spin out its TPU crew right into a separate enterprise, retain an enormous stake, and have it go public. Simple peasy solution to make a bazillion {dollars},” stated Erik Bernhardsson, founding father of Modal Labs.
If Google begins promoting TPUs, it’s going to positively see robust market demand. These chips are able to coaching fashions, too. For example, Google used Trillium TPUs to coach Gemini 2.0, and now, each enterprises and startups can make the most of the identical highly effective and environment friendly infrastructure.
Curiously, TPUs had been initially developed for Google’s personal AI-driven providers, together with Google Search, Google Translate, Google Images, and YouTube.
A current report says Google would possibly crew up with MediaTek to construct its next-gen TPUs. One motive behind this transfer may very well be MediaTek’s shut ties with TSMC, which affords Google decrease chip prices than Broadcom.
Notably, earlier this 12 months, Google introduced an funding of $75 billion in capital expenditures for 2025.
Within the newest earnings name, Google’s CFO Anat Ashkenazi admitted to benefitting from having TPUs once they make investments capital in constructing information centres. “Our technique is to lean totally on our personal information facilities, which suggests they’re extra customised to our wants. Our TPUs are customised for our workloads and wishes. So, it does enable us to be extra environment friendly and productive with that funding and spend,” she stated.
Google reportedly spent between $6 billion and $9 billion on TPUs previously 12 months, primarily based on estimates from analysis agency Omdia. Regardless of its funding in customized chips, Google stays a serious NVIDIA buyer.
In accordance with a current report, the search large is in superior discussions to lease NVIDIA’s Blackwell chips from CoreWeave, a rising participant within the cloud computing house. This proves that even high NVIDIA purchasers like Google are going through issue in securing sufficient chips to fulfill the rising demand from their customers.
Furthermore, integrating GPUs from others like NVIDIA isn’t straightforward both — cloud suppliers have to transform their infrastructure. In a current interplay with AIM, Karan Batta, senior vice chairman, Oracle Cloud Infrastructure (OCI), stated that the majority centres are usually not prepared for liquid cooling, acknowledging the complexity of managing the warmth produced by the brand new era of NVIDIA Blackwell GPUs.
He added that cloud suppliers should select between passive or lively cooling, full-loop techniques, or sidecar approaches to combine liquid cooling successfully. Batta additional famous that whereas server racks observe an ordinary design (and could be copied from NVIDIA’s setup), the true complexity lies in information centre design and networking.
To not neglect, Oracle is below strain to complete constructing an information heart in Abilene, Texas — roughly the scale of 17 soccer fields — for OpenAI. Proper now, the power is incomplete and sitting empty. If delays proceed, OpenAI might stroll away from the deal, probably costing Oracle billions.
AWS is Following Swimsuit Too
Very like Google, AWS is constructing its personal chips too. At AWS re:Invent in Las Vegas, the cloud large introduced a number of new chips, together with Trainium2, Graviton4, and Inferentia.
Final 12 months, AWS invested $4 billion in Anthropic, changing into its major cloud supplier and coaching associate. The corporate additionally launched Trn2 UltraServers and its next-generation Trainium3 AI coaching chip.
AWS is now working with Anthropic on Venture Rainier — a big AI compute cluster powered by hundreds of Trainium2 chips. This setup will assist Anthropic develop its fashions and optimise its flagship product, Claude, to run effectively on Trainium2 {hardware}.
Ironwood isn’t the one participant within the inference house. Plenty of firms are actually competing for NVIDIA’s chip market share, together with AI chip startups like Groq, Cerebras Methods, and SambaNova Methods.
On the similar time, OpenAI is progressing in its plan to develop customized AI chips to cut back its reliance on NVIDIA. In accordance with a report, the corporate is getting ready to finalise the design of its first in-house chip within the coming months and intends to ship it for fabrication at TSMC.
The publish Ironwood is Google’s Reply to the GPU Crunch appeared first on Analytics India Journal.