OpenAI’s newly released GPT-4o mini dominates the Chatbot Arena. Here’s why.

Trophy technology

One week ago, OpenAI released GPT-4o mini. In that short time, it has already been updated and climbed the leaderboards of the Large Model Systems Organization (LMSYS) Chatbot Arena, ahead of giants such as Claude 3.5 Sonnet and Gemini Advanced.

The LMSYS Chatbot Arena is a crowdsourced platform where users can evaluate large language models (LLMs) by chatting with two LLMs side by side and comparing their responses to each other without knowing the models' names.

Also: Want to try GPT-4o mini? 3 ways to access the smarter, cheaper AI model – and 2 are free

Immediately after its unveiling, GPT-4o mini was added to the Arena, where it quickly climbed to the top of the leaderboard behind GPT-4o. This is especially notable because GPT-4o mini is 20 times cheaper than its predecessor.

As the results came out, some users took to social media to express apprehensions about how such a new mini model could rank higher than more established, robust, and capable models such as Claude 3.5 Sonnet. To address the concerns, LMSYS — posting on X — explained the factors contributing to GPT-4o mini's high placement, highlighting that the Chatbot Arena positions are informed by human preferences depending on the votes.

For users interested in learning which model works better, LMSYS encourages them to look at the per-category breakdowns to understand technical capabilities. These can be accessed by clicking the Category dropdown that says "Overall" and selecting a different category. When you visit the various category breakdowns — such as coding, hard prompts, and longer queries — you will see a variation in the results.

Also: OpenAI launches SearchGPT – here's what it can do and how to access it

In the coding category, GPT-4o mini is ranked third behind GPT-4o and Claude 3.5 Sonnet, which holds first place. However, GPT-4o mini is number one in other categories, such as multi-turn, conversations greater than or equal to two turns, and longer query queries equal to or greater than 500 tokens.

Chatbot Arena results in the "coding" category.

If you want to try GPT-4o mini, visit the ChatGPT site and log into your OpenAI account. If you would rather participate in the Chatbot Arena and let luck show you GPT-4o mini, you can start by visiting the website, clicking Arena side-by-side, and then entering a sample prompt.

Artificial Intelligence

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...