The Darkish Facet of o3

OpenAI is Likely To Pull the Plug on ChatGPT

OpenAI’s o3 is among the many best-performing reasoning fashions out there for customers right this moment. Benchmark scores point out that the mannequin outperforms a number of competing fashions throughout varied features, together with coding, math, graduate-level science issues, and extra. A number of customers on social media have praised the mannequin’s efficiency.

Nevertheless, the mannequin’s most vital drawbacks are hallucinations and reward hacking, or specification gaming.

A Warning Signal for Future Reasoning Fashions

A current research revealed by Palisade Analysis, a non-profit organisation, reveals that OpenAI’s o3 mannequin is topic to ‘specification gaming’ — a course of the place an AI mannequin takes the target of a given drawback too actually, deviates from a suitable course of, and engages in malpractice to attain its goal. In such instances, a mannequin is set to attain its end result and can use unintended strategies.

The analysis arrange an AI mannequin to play chess in opposition to the Stockfish chess engine. This experiment discovered that AI fashions — OpenAI’s o1-preview, o3 and DeepSeekR1 typically observe that the chess engine is just too sturdy for them to win in opposition to, after which hack the sport atmosphere to win.

“Surprisingly, o1 and o3-mini don’t present this behaviour,” learn the report. “In distinction, o3 reveals excessive hacking propensity, hacking in 88% of runs.”

These hacks concerned complicated the engine, changing the board, and at instances, changing the engine itself.

“Such behaviours grow to be extra regarding as AI programs develop extra succesful. In complicated situations, AI programs would possibly pursue their targets in ways in which battle with human pursuits,” learn the report.

As AI programs improve their situational consciousness and develop strategic reasoning about their environment, such occurrences could grow to be extra frequent. This concern is especially problematic when equal focus is critical on each problem-solving strategies and the options themselves.

Supply: Palisade Analysis

Specification gaming is an notorious apply that has been noticed in AI programs all through time. Whereas the above research focuses on a sport of chess, the researchers shared a doc outlining extra such situations.

The Palisade Analysis report means that as extra reasoning and agentic fashions emerge, they might be extra susceptible to gaming the targets, and calls the research an ‘early warning signal’.

“First, we advise testing for specification gaming must be an ordinary a part of mannequin analysis. Second, rising mannequin capabilities could require proportionally stronger safeguards in opposition to unintended behaviours,” learn a report part.

2x Hallucinations than the o1 Mannequin

In addition to, a number of customers have discovered the o3 mannequin to hallucinate throughout a number of situations,

A number of customers throughout social media have expressed their frustrations in direction of the hallucinations current contained in the mannequin.

And OpenAI acknowledges this as nicely. Earlier, the corporate launched a ‘mannequin card’ for the newly launched o3 and o4-mini fashions, outlining the mannequin’s behaviour and shortcomings.

Benchmarks assessing the mannequin’s hallucinations reveal that the o3 mannequin had the next fee in comparison with its predecessor, the o1. Notably, within the PersonQA analysis—a dataset that includes questions and publicly accessible info about people that assesses the mannequin’s accuracy in answering—the o3 mannequin exhibited double the hallucination fee of the o1.

Supply: OpenAI

Within the mannequin card, OpenAI additionally outlined a few of the mannequin’s different unintended behaviours, similar to reward hacking, under-reporting its capabilities, deception, and so forth.

Final month, Transluce, one other unbiased non-profit analysis lab, outlined their findings on the pre-release model of the o3 mannequin and revealed that the mannequin ‘often fabricates actions’ that it by no means took, whereas additionally ‘elaborately’ justifying these actions when confronted with them.

Experiments revealed situations during which the mannequin claimed to run non-existent code by itself laptop computer, insisting that it did so.

Different conditions embody making up its personal time, ‘gaslighting’ the person about incorrectly copying a chunk of data, and pretending to analyse log information from an internet server.

That is truly wild. ChatGPT o3 is superb however the hallucinations are fairly uncontrolled.
I requested o3 to offer me some clips of outstanding AI figures speaking about Ethereum and it gave me this chart.
One drawback, none of those quotes are actual… pic.twitter.com/CMC6hfuEsJ

— Eric Conner (@econoar) Might 8, 2025

Along with normal points like hallucinations, Transcluce outlines components that come up from outcome-based Reinforcement Studying (RL) coaching—a mannequin that learns by means of trial and error. It’s guided by a reward system that gives rewards for proper solutions and penalties for incorrect ones.

The research indicated that if the reward operate solely rewards appropriate solutions, a mannequin lacks the motivation to confess it can’t clear up an issue, as this doesn’t depend as appropriate. When confronted with unsolvable or overly complicated issues, the mannequin should take a guess at a solution in case it’s correct.

These issues can also come up on account of chains of thought, during which the mannequin outlines its reasoning steps earlier than offering the response.

The research signifies that the mannequin’s inside chain of thought is obscured and indifferent from its conversational context, ensuing within the mannequin shedding observe of its prior reasoning.

Subsequently, when requested about earlier statements, it has to create plausible explanations because it can’t recall the precise foundation for its earlier responses.

“To place this one other manner, o-series fashions don’t even have sufficient data of their context to precisely report the actions they took in earlier turns. Which means the easy technique of ‘telling the reality’ could also be unavailable to those fashions when customers ask about earlier actions,” added the research.

The put up The Darkish Facet of o3 appeared first on Analytics India Journal.

A Warning Signal for Future Reasoning Fashions

2x Hallucinations than the o1 Mannequin

Latest stories

CMS Uses Machine Learning to Fully Reconstruct LHC Collisions

LANL: AI Accelerates Elucidation of Nuclear Forces with Explosive Neutron...

PNNL: Integrating AI into Biological Research

Rick Stevens on the Genesis Mission and the Future of...

Inside the DOE’s 26 AI Challenges for Genesis Mission

You might also like...

CMS Uses Machine Learning to Fully Reconstruct LHC Collisions

LANL: AI Accelerates Elucidation of Nuclear Forces with Explosive Neutron Star Data

PNNL: Integrating AI into Biological Research