When an organization like NVIDIA turns into the popular possibility for many AI labs for {hardware} and rises to the standing of the world’s largest firm, does it suggest that the GPU maker excels in each area? Most likely not.
American AI infrastructure supplier Groq as soon as shared an amusing remark in a weblog put up. “GPUs are cool for coaching fashions, however for inference, they’re slowpokes, main on to the great-model-that-no-one-uses drawback.”
The corporate is properly on its technique to beating NVIDIA in offering inference – an important course of wherein a pre-trained AI mannequin applies its learnings to generate outputs.
Groq’s language processing unit (LPU) gives capabilities particular to AI inference in methods significantly better than a standard graphics processing unit (GPU).
In a podcast interview with enterprise capitalist Harry Stebbings, Groq CEO Jonathan Ross defined that as an alternative of counting on exterior reminiscence like GPUs do, LPUs hold all of the mannequin parameters instantly inside their chips.
“Think about you had been making an attempt to construct a manufacturing facility, and it was just one/one centesimal of the scale wanted for the meeting line,” mentioned Ross in an analogy, indicating how GPUs function.
This could result in the manufacturing facility repeatedly processing small batches, dismantling the setup, and restarting the method time and again.
In distinction, Ross mentioned LPUs permit computation to move easily via 1000’s of chips concurrently, eliminating inefficiencies and considerably enhancing the pace.
Whereas extra chips are getting used, LPUs have a considerably much less vitality consumption footprint than GPUs.
Owing to this, Ross mentioned, “We [Groq] should be one of the crucial necessary compute suppliers on this planet. Our aim by the tip of 2027 is to supply at the very least half of the world’s AI inference compute.”
Furthermore, final yr, NVIDIA CEO Jensen Huang mentioned that one of many main challenges NVIDIA presently faces is producing tokens at extremely low latency.
Nevertheless, by no means is Groq’s mission purported to be misunderstood – they’re not competing with NVIDIA.
‘I Assume NVIDIA Will Promote Each Single GPU They Make for Coaching’
At first look, Ross’ statements and the latest occasions surrounding NVIDIA may counsel that the Taiwanese large is in hassle, nevertheless it isn’t. Each Groq and any inference options supplier will co-exist with NVIDIA.
“Coaching ought to be completed on GPUs,” Ross mentioned. “I believe NVIDIA will promote each single GPU they make for coaching.”
Ross added that if Groq had been to deploy excessive volumes of lower-cost inference chips, the demand for coaching would enhance. “The extra inference you may have, the extra coaching you want, and vice versa,” he mentioned.
Furthermore, Ross mentioned Groq contemplates promoting its LPUs as a “nitro increase to GPUs”. The corporate experimented with operating parts of a mannequin on an LPU and the remaining on the GPUs. This quickens the method and makes the GPUs run far more economically.
Having mentioned that, Ross and the corporate don’t actually view NVIDIA as a competitor. “They [NVIDIA] don’t supply quick tokens and low-cost tokens. It’s a really totally different product, however what they do very properly is coaching, and so they do it higher than anybody else,” he mentioned.
The demand for GPUs won’t finish regardless of the appearance of a rising AI inference supplier market. “How are you going to do the coaching?” Ross requested.
“Purchase the GPUs. Get each single one you possibly can,” he mentioned.
Not With out Competitors
That mentioned, Groq competes with a number of different inference service suppliers. Most notably, Cerebras and SambaNova, additionally primarily based in the US, supply {hardware} merchandise that instantly goal NVIDIA’s dominance.
Lately, Perplexity AI and Mistral AI introduced the mixing of Cerebras Inference into their merchandise. The latter calls its app ‘Le Chat’ – the quickest AI assistant on this planet.
However, SambaNova is the one inference supplier among the many trio that’s able to dealing with the Llama 3.1 405B mannequin.
Groq, alternatively, not sells its AI inference {hardware}, and its proprietary expertise might be accessed on the cloud platform by way of totally different fashions on GroqCloud.
The platform hosts a number of third-party fashions, together with ones developed by Alibaba (Qwen), Meta (Llama), and DeepSeek (R1).
Furthermore, Groq has introduced that it’s obtainable on OpenRouter.ai, a platform that gives a unified interface for accessing quite a few AI fashions. Groq now permits customers to make use of DeepSeek-R1 distilled on Meta’s Llama 70B with 1,000 tokens per second.
Lately, Saudi Arabia introduced a $1.5 billion funding in Groq to broaden AI infrastructure within the area. The funding builds on Groq’s earlier work within the area, together with the fast deployment of the biggest AI inference cluster within the Center East in December 2024.
The put up Groq Goals to Present At Least Half of the World’s AI Compute, Says CEO appeared first on Analytics India Journal.