OpenAI launches new o3-mini mannequin — here is how free ChatGPT customers can attempt it

On the final day of OpenAI's 12 days of 'shipmas,' the corporate unveiled its newest fashions, o3 and o3-mini, which excel at reasoning and even outperform o1 on a collection of benchmarks, together with math and science. At launch, OpenAI CEO Sam Altman mentioned o3 was slated to drop on the finish of January, and at present, the corporate made good on its promise.

o3-mini

On Friday, OpenAI launched its o3-mini mannequin, probably the most cost-efficient mannequin in OpenAI's reasoning collection, to the general public. Till now, that collection has been comprised of o1 and o1-mini. Like its predecessor, the mannequin is especially sturdy in science, math, and coding, in response to the corporate.

When o3-mini is chosen, it is going to use medium reasoning effort, which balances velocity and accuracy. Whereas the unique o1 mannequin nonetheless has broader common data than o3-mini, the brand new mannequin's main benefit is its quicker velocity and better efficiency in comparison with o1-mini.

Benchmark efficiency

When evaluating the efficiency of o3-mini to o1-mini, skilled testers discovered that o3-mini delivered extra correct, reasoned-through, and clearer responses than o1-mini. Based on the submit, they most popular o3-mini responses 56% of the time and noticed a 39% discount in main errors.

Past human choice evaluations, in a number of STEM benchmarks, together with the Competitors Math (AIME 2024), PhD-level Science Questions (GPQA Diamond), and Competitors Code (Codeforces), o3-mini with medium reasoning — which is what ChatGPT customers will get by default — outperformed o1-mini.

Additionally notable is that o3-mini, with excessive reasoning effort within the benchmarks, got here near o1 efficiency, generally even surpassing it, as seen within the AIME 2024 above and Software program Engineering (SWE-bench Verified) benchmarks. The o3-mini mannequin with medium reasoning effort matched o1's efficiency within the Codeforces benchmark.

Security

OpenAI assessed o3-mini's security via public launch via jailbreak and disallowed content material evaluations. The corporate discovered that the mannequin considerably surpasses GPT-4o on the evaluations. OpenAI posted the analysis outcomes under and in addition launched an o3-mini System Card, a 37-page PDF that features the detailed outcomes of the evaluations.

The way to entry

All subscribers to OpenAI's paid tiers, together with ChatGPT Plus, Workforce, and Professional, can entry OpenAI o3-mini beginning at present. Plus and Workforce customers now have 3 times the speed restrict, going from 50 messages per day with o1-mini to 150 messages per day. ChatGPT Enterprise entry is coming in per week.

Additionally: Copilot's highly effective new 'Suppose Deeper' characteristic is free for all customers — the way it works

The o3-mini mannequin will change o1-mini within the mannequin picker, as it could be helpful for a similar duties, besides that have will now be improved with decrease latency and better charge limits. As a paid consumer, on the time of writing, I didn’t but have entry to the o3-mini, and am as a substitute nonetheless seeing the o1-mini possibility.

In case you don't have a subscription, no worries: You possibly can see if o3-mini is definitely worth the hype out of your free account. All free ChatGPT customers need to do is click on on "Cause" within the message textbox or regenerate a response. OpenAI CEO Sam Altman confirmed free entry in a submit on X. Till now, all of the reasoning fashions have been saved behind a paywall; OpenAI didn’t specify any limitations across the new mannequin for Free customers.

o3-mini

Benchmark efficiency

Security

The way to entry

Synthetic Intelligence