SambaNova Launches Fastest AI Platform with Llama 3.1 405B at 132 Tokens per Second

SambaNova Systems has introduced SambaNova Cloud, the fastest AI inference platform available today, powered by its SN40L AI chip. The platform offers developers immediate access to Meta’s Llama 3.1 models, including the 405B model, at full 16-bit precision and at a rate of 132 tokens per second (t/s).

The Llama 3.1 70B model runs at 461 t/s. The service is now open to developers without a waiting list.

Cerebras Inference recently announced that it delivers 1,800 tokens per second for the Llama 3.1 8B model and 450 tokens per second for the Llama 3.1 70B model, making it 20 times faster than NVIDIA GPU based hyperscale clouds. Meanwhile, Groq can achieve over 500 tokens per second on the Llama 3.1 70B model.

SambaNova Cloud supports both the Llama 3.1 70B model, designed for agentic AI applications, and the 405B model, the largest open-source AI model available.

According to SambaNova CEO Rodrigo Liang, this versatility offers developers the ability to run high-speed, lower-cost models as well as the highest fidelity model at full precision. “Enterprise customers want versatility – 70B at lightning speeds for agentic AI systems, and the highest fidelity 405B model for when they need the best results. SambaNova Cloud is the only platform that offers both today,” he said.

Artificial Analysis independently benchmarked SambaNova Cloud, confirming its performance as the fastest available AI platform for Llama 3.1 models. The service surpassed offerings from competitors like OpenAI, Anthropic, and Google, making it suitable for real-time AI applications and agentic workflows.

Meta’s Llama 3.1 models are recognised as the most popular open-source models available today, and the 405B model is the largest open-weights model. The cost and complexity of running such large models have been traditionally high, but SambaNova’s SN40L chips significantly reduce these challenges, offering higher speeds at a lower cost compared to Nvidia H100s.

Industry experts have responded positively to SambaNova Cloud’s speed and efficiency. Dr. Andrew Ng, Founder of DeepLearning.AI, emphasized the importance of token generation speed in agentic AI workflows, highlighting the platform’s unique ability to deliver fast results using large models. Bigtincan and Blackbox AI are among the first to partner with SambaNova to enhance their own AI-driven products.

SambaNova Cloud is now available in three tiers: Free, Developer, and Enterprise. The Free Tier offers free API access to developers starting today, while the Developer and Enterprise tiers will support higher rate limits and scaling capabilities for production workloads.

The SN40L AI chip, with its patented dataflow design and three-tier memory architecture, powers the performance of SambaNova Cloud, making it a key platform for developers building next-generation AI applications.

The post SambaNova Launches Fastest AI Platform with Llama 3.1 405B at 132 Tokens per Second appeared first on AIM.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...