French AI startup Mistral unveiled the Le Chat app for iOS and Android a couple of days in the past. The app capabilities as an AI chatbot or assistant, rivalling ChatGPT, Claude, and Gemini, amongst others. The app provides most of its options without spending a dime, with upgraded limits within the Professional tier, which prices $14.99 month-to-month.
Le Chat provides internet search, picture and doc understanding capabilities alongside code interpretation and picture era.
Given the sheer variety of AI assistant purposes available in the market, a brand new entrant should supply a formidable differentiation. Mistral claims its low-latency fashions are powered by the ‘quickest inference engines on the planet’. Moreover, Mistral additionally says that it responds sooner than every other chat assistant, as much as 1,100 phrases per second, through their Flash Solutions function.
Because of Cerebras
Cerebras Inference, a service that delivers high-speed processing to AI purposes, is the key sauce to its pace.
In line with the corporate, Cerebras Inference is the ‘world’s quickest AI inference supplier’ and makes Le Chat 10 occasions sooner than GPT-4o, Claude Sonnet 3.5, and DeepSeek R1. Cerbras additionally revealed that the 123 billion-parameter Mistral Massive mannequin is behind Le Chat.
Mistral and Cerebras in contrast Le Chat with Claude 3.5 Sonnet and ChatGPT-4o utilizing a immediate to generate a snake sport utilizing Python.
The outcomes from Mistral’s YouTube video revealed that ChatGPT outputs 85 tokens per second, Claude 120 tokens per second, and Le Chat outperformed the 2 with 1,100 tokens per second.
In a video by Cerebras, it was revealed that Le Chat took 1.3 seconds to finish the duty, Claude took 3.5 Sonnet took 19 seconds, and GPT-4o took 46 seconds.
“This efficiency is made attainable by the Wafer Scale Engine 3’s SRAM-based inference structure together with speculative decoding methods developed in collaboration with researchers at Mistral,” mentioned Cerebras in a blogpost.
A number of customers additionally resonate with these claims. A consumer named Marc on X mentioned that the mannequin is “mind-blowingly quick” and added that it constructed a easy React software in lower than 5 seconds.
Okay Le Chat by @MistralAI is 10x sooner than ChatGPT. Possibly 100x.
— Pol Maire (@polmaire) February 6, 2025
Right here’s What We Present in Our Actual-World Exams
In addition to, we at AIM additionally performed a real-time check of a few of the main fashions, albeit with a distinct immediate, which expects AI fashions to unravel a Chemistry numerical downside from an IIT-JEE query paper – which is usually thought of one of many world’s most troublesome examinations.
We thought of OpenAI’s GPT-4o, o3 Mini, o3 Mini Excessive, Anthropic’s Claude 3.5 Sonnet, DeepSeek R1, Google’s Gemini 2.0 Flash, and naturally, Mistral’s Le Chat.
The next query was used as enter: “Ice at –10°C is to be transformed into steam at 110°C. The mass of ice is 10-3 kg. What quantity of warmth is required?”
Whereas we timed the outcomes, Mistral’s Le Chat was the quickest mannequin, however it comes with a caveat.
Le Chat returned the output in lower than 4 seconds in three out of six occasions we examined the mannequin. Alternatively, Google’s Gemini 2.0 Flash returned the output beneath 6 seconds all of the occasions we examined it.
It begs the query of whether or not Flash Solutions got here into motion every time regardless of being enabled by default.
Notice that we have been utilizing the free model of the Le Chat assistant, and the professional model gives an upgraded restrict to the Flash Solutions function.
Furthermore, the speeds at which these fashions carry out additionally depend on the character of the queries. Reasoning fashions, with their prolonged chain of ideas, prioritise the accuracy of the reply and are sure to take extra time.
As an illustration, after we examined the immediate with DeepSeek R1, it took over a minute to finish the issue, with a series of ideas that concerned verification steps, the place the mannequin mentioned, “However let me test if all of the values are right. Did I exploit the precise particular warmth for steam?” and so forth.
Moreover, it additionally took a substantial amount of time to make sure the reply was supplied with the precise variety of decimal figures.
A check from Synthetic Evaluation revealed that OpenAI’s o3-mini was the quickest mannequin among the many competitors, which outputs 214 tokens per second, forward of the o1-mini, at 167 tokens per second.
In line with Synthetic Evaluation, o3-mini additionally achieved a excessive rating of 89 on its High quality Index, which is on par with o1 (90 factors) and DeepSeek R1 (89 factors). This high quality index quantifies the general capabilities of the AI mannequin.
OpenAI has prioritised inference time scaling to ship outputs at increased speeds. With Cerbreas’ inference capabilities, Mistral appears to have joined the race. Furthermore, there’s an ongoing battle of token speeds between inference suppliers like Cerebras, Groq, and SambaNova.
These ambitions to ship high-speed responses align with what Jensen Huang, CEO of NVIDIA, mentioned final yr. He envisioned a future the place AI techniques carry out numerous duties, equivalent to tree search, chain of thought, and psychological simulations, reflecting on their very own solutions and responding in real-time—inside a single second.
The publish Is Mistral’s Le Chat Really the “World’s Quickest AI Assistant”? appeared first on Analytics India Journal.