AI {hardware} maker Cerebras introduced on Wednesday that its programs have outperformed NVIDIA’s DGX B200 with 8 Blackwell GPUs’ [Graphic Processing Units] output token velocity on Meta’s Llama 4 Maverick Mannequin.
Cerebras achieved an output token velocity of over 2,500 tokens/sec, whereas NVIDIA demonstrated only one,000 tokens per second.
Nonetheless, NVIDIA outperformed programs from Groq, AMD, Google, and different distributors. “Solely Cerebras stands – and we smoked Blackwell,” mentioned Cerebras in a submit on X.
Cerebras simply beat NVIDIA Blackwell
Final week: Blackwell hit 1,000 t/s on Llama 4.
In the present day: Cerebras hit 2,500 t/s on the identical mannequin, similar benchmarks by @ArtificialAnlys
Blackwell smoked Groq, AMD, Google – everybody.
Solely Cerebras stands – and we smoked Blackwell. pic.twitter.com/2Nd0W8ttOB— Cerebras (@CerebrasSystems) Could 28, 2025
Based mostly in the USA, Cerebras manufactures {hardware} particularly designed for AI inference, utilizing a skilled AI mannequin to make choices. The corporate’s Wafer-Scale Engine (WSE) expertise provides sooner inference/output tokens velocity than conventional GPUs.
“We’ve examined dozens of distributors, and Cerebras is the one inference answer that outperforms Blackwell for Meta’s flagship mannequin,” mentioned the corporate.
Final month, Meta introduced a partnership with Cerebras to supply builders entry to inference speeds as much as 18 instances sooner than GPU-based options.
Whereas GPUs are broadly used for coaching AI fashions, which require an enormous quantity of information and compute, devoted options for inferencing are being developed at a big scale. Cerebras, Groq, and SambaNova are a number of the different firms engaged on such options.
SambaNova’s SN40L customized AI chip, which options their Reconfigurable Dataflow Unit structure. Manufactured on TSMC’s 5 nm course of, the SN40L combines DRAM, HBM3, and SRAM on every chip.
Then again, Groq provides AI inference processors known as LPUs, or language processing models. As a substitute of counting on exterior reminiscence like GPUs, LPUs hold all of the mannequin parameters instantly inside their chips.
The submit We Smoked NVIDIA’s Blackwell, Says Cerebras appeared first on Analytics India Journal.