
In 2025, NVIDIA took two hits that briefly disrupted the market’s confidence in its trajectory.
The first came early in the year with DeepSeek, which showed that a competitive large language model could be trained using far fewer GPUs than previously assumed, challenging expectations of ever-rising compute demand.
The second centred on Google’s TPUs.
While their technical maturity had long been understood, markets only fully registered their significance for specific neural network operations in 2025. The year saw Anthropic raise its TPU deployment targets, Meta reportedly explored TPU usage, and Gemini 3 Pro, the year’s most powerful frontier model, was trained entirely on TPUs.
NVIDIA also signalled it was responding directly to TPU-style competition by striking a roughly $20-billion deal with Groq, an AI-based application-specific integrated circuits (ASIC) company built around efficiency gains over GPUs.
Groq’s architecture, like TPUs, was designed for purpose-built AI workloads, particularly inference, and its founder, Jonathan Ross, was one of the original architects of Google’s TPU programme.
And if NVIDIA is to continue winning, it has to beat Google’s TPUs—an ASIC project even Jensen Huang has publicly praised as one of the strongest in the industry.
We’re delighted by Google’s success — they’ve made great advances in AI and we continue to supply to Google.
NVIDIA is a generation ahead of the industry — it’s the only platform that runs every AI model and does it everywhere computing is done.
NVIDIA offers greater…— NVIDIA Newsroom (@nvidianewsroom) November 25, 2025
Where TPUs Win, and Why That’s Not Enough
It is worth examining where TPUs outperform GPUs, particularly given that Anthropic has placed $21 billion in TPU orders with Broadcom, which manufactures Google’s hardware.
According to research firm SemiAnalysis, TPU v7 Ironwood illustrates the shift clearly. While it delivers roughly 10% lower peak floating-point operations per second (FLOPs) and memory bandwidth than NVIDIA’s GB200 platform, signalling data movement struggles, it still offers a stronger performance-per-total-cost-of-ownership (TCO) profile.
Google’s internal cost to deploy Ironwood is about 44% lower than deploying an equivalent NVIDIA system, SemiAnalysis estimated. Even when priced for external customers, TPU v7 is estimated to offer around 30% lower TCO than GB200, and roughly 41% lower TCO than the upcoming GB300.
Yet these advantages don’t come easily. Achieving peak TPU efficiency requires deep compiler expertise, custom kernels, and careful model sharding.
Because of TPUs’ historically low-profile software ecosystem that was more Google-oriented, hitting the critical 40% model FLOPS utilisation (MFU) threshold for overcoming data movement bottlenecks demands specialised knowledge that relatively few organisations possess.
For years, TPUs were optimised primarily for Google’s preferred frameworks and tooling, while the broader ecosystem—particularly PyTorch and other popular open-source inference frameworks—lagged behind.
CUDA is King
GPUs, by contrast, have sustained their dominance largely because access has been thoroughly democratised through software. At the centre of this advantage is CUDA, the AI giant’s parallel computing platform.
Over time, NVIDIA has folded years of hardware-specific optimisation directly into mainstream AI frameworks such as PyTorch, so that performance gains are largely automatic rather than something users have to actively engineer. The practical result is that GPUs tend to deliver strong, predictable performance across a wide range of workloads without requiring specialised expertise.
GPUs are available across every major cloud, widely deployable on-premise, and backed by a software ecosystem that has matured alongside the modern AI boom.
“Right now, just about everyone starts their hands-on AI learning on GPUs,” said Jordan Nanos, an analyst at SemiAnalysis, in an interaction with AIM. “A small fraction of those people begin to use TPUs later in their career if they happen to work at the right company or go to the right school.”
CUDA is also supported by a large and active developer and research ecosystem, where new experiments, tools, and frameworks are continuously built, steadily expanding the GPU capabilities that developers can access.
That software moat remains a major reason why GPUs remain the default choice for both training and inference.
For most external users, TPUs still require more effort, more specialised knowledge, and greater organisational commitment to use effectively, despite their underlying cost and performance advantages.
How Anthropic Broke Through
Hardware design and cost advantages, however, can overcome software barriers—with the right expertise.
SemiAnalysis stated that Anthropic’s success stems from its team comprising ex-Google TPU and compiler engineers who understand both the hardware stack and their own model architectures at a deep level. This specialised talent allowed the company to navigate TPU complexity and extract the performance that makes TPU economics compelling.
The impact has been visible in production, as Anthropic’s release of Claude Opus 4.5 was accompanied by a roughly 67% API price cut over Opus 4.1, alongside improvements in token efficiency and lower verbosity.
“Both Gemini and Claude make it clear that frontier models can run training and inference on TPUs,” said Nanos, indicating that the chips, with the right amount of tooling and development, are ready to handle the most demanding AI workloads today.
Google also deployed TPUs for inference in earlier Gemini models, dating back to Gemini 1.5 Pro in 2024. Other companies that have confirmed TPU usage include Cohere, Apple, and Super Safe Intelligence.
“We believe that TPUs are a serious consideration for the largest companies with the largest compute needs in the world, such as OpenAI, Google, Anthropic, Meta, and xAI,” Nanos noted.
While Google Cloud remains the primary channel, the company is increasingly allowing TPUs to be deployed through third-party operators. Anthropic, for instance, is accessing TPUs via Fluidstack, which runs TPU clusters in data centres owned by providers including TeraWulf, Cipher Mining, and Hut 8.
Google is Working on It
Google is now actively working to close the software accessibility gap.
In December, Reuters reported that the company is developing an internal initiative named ‘TorchTPU’ to make TPUs natively compatible with PyTorch. Developed in close collaboration with Meta, the effort targets solving the mismatch between Google’s internally optimised software stack and the tools most AI developers already use.
In an interaction with AIM, Alan Ma, a Stanford engineer and the author of Unwrapping TPUs, said easy TPU adaptability can only be an advantage. “CUDA has been the bread and butter for a lot of these hardware optimisation techniques, but there is a need to go to a higher level of abstraction, which is what we are seeing with PyTorch.”
The goal is not to replace NVIDIA’s software stack, but to make TPU performance accessible through the same abstractions that made GPUs dominant in the first place.
Even so, Nanos cautions that Google still has significant ground to cover if it were to target more customers. “I don’t believe that TPUs will be the default choice for AI until the entire open source developer community, starting primarily in academia, adopts TPUs,” he said.
Having said that, developer activity around TPUs on Google Cloud grew by 96% in just six months, according to Dainci Developer Dataset, indicating growing developer interest, even outside of Google.
The Heterogeneous Compute Big Bang
There is, however, another dimension to the TPU story.
“The trend that we’re seeing is OpenAI, Anthropic, and all these different labs are using as much compute as they can get,” said Ma.
OpenAI has active compute deals with NVIDIA and AMD, while Anthropic operates across NVIDIA GPUs, Amazon’s Trainium chips, and a growing proportion of TPUs. “A heterogeneous compute big bang is happening,” he said.
This also gives companies a basis to benchmark competing accelerators against one another.
“Every company wants to know the TCO—the amount of performance they get per dollar spent—of each of these chips. Then they buy the chips that give them the best TCO,” said Nanos.
That diversification, however, does not imply oversupply.
Carmen Li, founder and chief executive of Silicon Data, told AIM, “You can have 20 design houses on top,” but at the end of the day, there is effectively “one true fab” supplying advanced AI chips. As she put it, foundries are constantly deciding: “Should I give capacity to TPU production, media, or AMD?”
In other words, adding more TPU buyers does not create excess supply. It reshuffles how the manufacturing capacity—however scarce—is allocated. More companies turn to TPUs, GPUs, and custom accelerators not because the market is flush with chips; they are securing incremental compute wherever available.
The post What is Protecting NVIDIA from Google’s TPUs? appeared first on Analytics India Magazine.