xAI’s Grok-2 Ranks Second on the Chatbot Arena Leaderboard, Competing with Gemini 1.5 and GPT-4o

In an exciting development from the xAI team, Grok-2 and Grok-Mini have officially secured positions on the LMSys Chatbot Arena leaderboard. Grok-2 has taken the #2 spot, surpassing GPT-4o (May) and tying with the latest Gemini model, driven by over 6,000 community votes.

Meanwhile, Grok-2-Mini has earned the #5 position.

Grok-2 has excelled particularly in mathematical tasks, ranking #1 in this category, and secured the #2 positions across various other tasks, including hard prompts, coding, and instruction-following.

Additionally, Grok-2-Mini has undergone significant speed enhancements, now performing twice as fast as before. This boost was achieved after xAI’s inference team as they completely rewrote the inference stack using SGLang, enabling more efficient multi-host inference and improved accuracy.

The team also introduced new algorithms for computation and communication kernels, alongside better batch scheduling and quantisation, further enhancing the models’ performance.

Grok 2 mini is now 2x faster than it was yesterday. In the last three days @lm_zheng and @MalekiSaeed rewrote our inference stack from scratch using SGLang (https://t.co/M1M8BlXosH). This has also allowed us to serve the big Grok 2 model, which requires multi-host inference, at a… pic.twitter.com/G9iXTV8o0z

— ibab (@ibab) August 23, 2024

Several people are still sceptical about the performance. OpenAI’s GPT-4o, which claims the top spot, does not perform as well as Claude 3.5, which is at the 5th spot. Though, people have started experimenting with Grok-2 and claim that the model is actually brilliant in coding and maths related tasks.

Released in Beta this month, the Grok-2 family of models are also available for testing on X. The model also allows users to generate images using the FLUX.1 image generation model.

The post xAI’s Grok-2 Ranks Second on the Chatbot Arena Leaderboard, Competing with Gemini 1.5 and GPT-4o appeared first on AIM.

xAI’s Grok-2 Ranks Second on the Chatbot Arena Leaderboard, Competing with Gemini 1.5 and GPT-4o

OpenAI Unveils ‘Strawberry’ Model, Optimized for Complex Coding and Math

Microsoft Copilot to be integrated into Singapore’s legal technology platform

Adobe previews Firefly’s new gen AI enhanced ‘Video Model’ for on-demand clip creation

Meet Agentforce, Salesforce’s autonomous AI answer to employee burnout

Latest stories

Adobe previews Firefly’s new gen AI enhanced ‘Video Model’ for...

OpenAI Unveils ‘Strawberry’ Model, Optimized for Complex Coding and Math

Microsoft Copilot to be integrated into Singapore’s legal technology platform

Meet Agentforce, Salesforce’s autonomous AI answer to employee burnout

Zoho Adds Custom Model Building Studio to its Analytics Platform

You might also like...

Adobe previews Firefly’s new gen AI enhanced ‘Video Model’ for on-demand clip creation

OpenAI Unveils ‘Strawberry’ Model, Optimized for Complex Coding and Math

Microsoft Copilot to be integrated into Singapore’s legal technology platform