AI Will Win IOI (Not IMO) Gold in 2025 

Adam D’Angelo, the CEO of Quora, recently posed a question on the timeline for AI to achieve a gold medal in the International Olympiad in Informatics (IOI). He inquired about predictions, if any, regarding when AI might reach this significant milestone.

In response, one-third of the participants estimated that AI would achieve this level of success by the year 2025. The estimate isn’t way-off to be honest.

Recently, DeepMind’s AlphaProof and AlphaGeometry 2 AI models worked together to tackle questions from the IMO. The DeepMind team scored 28 out of 42 – enough for a silver medal but one point short of gold.

More Than Maths

And it’s not just about maths. DeepMind has a history of beating humans in other sports. Systems like AlphaGo went on to master the games of chess, Go and Shogi. It even put some of the world champions to the ground.

“Even if I become the number one, there is an entity that cannot be defeated,” said Lee Se-dol, the South Korean Go champion, who was defeated by AlphaGo by 4-1.

In the medical field, DeepMind has developed a AlphaFold 3 that can accurately predict the structure that proteins will fold into in a matter of days, solving a 50-year-old “grand challenge” that could pave the way for better understanding of diseases and drug discovery.

Developers Celebrate

Before this year’s competition, AlphaGeometry 2 could solve 83% of all historical IMO geometry problems from the past 25 years, compared to the 53% rate achieved by its predecessor. For IMO 2024, AlphaGeometry 2 solved Problem 4 within 19 seconds after receiving its formalisation.

The breakthrough is being celebrated in the developers’ community. OpenAI research scientist Mo Bavarian, who had won a silver medal in IMO, said that he could have imagined that a computer system would achieve a similar feat within his lifetime. “And yet here we are,” he added.

Scott Wu, the builder of Cognitive Labs, reflected on this achievement with amazement. “Olympiads were my whole life as a kid. Never thought they’d get solved by AI just ten years later,” he said.

Google’s senior product manager, Logan Kilpatrick, emphasised the significance of this accomplishment in the broader context of AI development. “Models that can solve really hard maths and physics problems are on the critical path to AGI, and today we took another step towards that,” he said.

AI Coming for Maths Experts

In recent times, AI models have made tremendous progress in the field of mathematics. Models like AlphaGeometry have demonstrated remarkable problem-solving abilities, rivalling human experts.

As per Nature, the AlphaGeometry software, created by AI experts at Google DeepMind, accurately responded to 25 of 30 queries posed to it.

“Astonishing and amazing,” is how IMO president Gregor Dolinar described these outcomes.

AlphaGometry is built by a group at Google DeepMind and New York University led by Trieu H Trinh. It uses a combination of symbolic AI, which Luong characterises as accurate, and a neural network that is more akin to LLMs, which handles the rapid, imaginative aspect of problem-solving to provide answers to maths problems of this level.

Additionally, AlphaGeometry solved a problem from the 2004 IMO that had eluded specialists in a more general way.

There’s More

Then there is NuminaMath 7B TIR, a joint collaboration between Numina and Hugging Face, which managed to solve 29 out of 50 problems in the AI Maths Olympiad.

NuminaMath is a mix of open-source libraries, notably TRL, PyTorch, vLLM, and DeepSpeed.

Last week saw another new model for maths reasoning MathΣtral. It is tailored to tackle complex, multi-step logical reasoning challenges in STEM fields.

For instance, MathΣtral 7B achieves significant accuracy enhancements, scoring 68.37% on maths through majority voting and 74.59% with a strong reward model among 64 candidates.

OpenAI is also working on a new AI technology under the code name ‘Strawberry’. This project aims to significantly enhance the reasoning capabilities of its AI models.

With enhanced problem-solving abilities, AI could solve complex mathematical problems, help in engineering calculations, and even participate in theoretical research. As per reports, Strawberry scored 90% on the maths test for neural networks.

The success of AlphaGeometry and NuminaMath in solving IMO geometry problems suggests that AI may soon be able to compete with the best human minds in mathematics.

Beyond Maths

In the medical field, researchers from Google and DeepMind have developed Med-Gemini, a new family of highly capable multimodal AI models specialised for medicine. The models outperformed human experts on tasks such as medical text summarisation and referral letter generation.

On the MedQA benchmark, which assesses medical question-answering abilities, Med-Gemini achieved an accuracy of 91.1%, surpassing the previous best by 4.6%. In multimodal tasks, the models outperformed GPT-4 by an average of 44.5%.

Built by Olympiad Champions

Most of the AI systems that exist today are being built by past Olympiad champions. In fact, Prafulla Dhariwal, who played a significant role in the making of GPT-4o, represented India in IMO.

Scott Wu, the brilliance behind Devin – touted as the most capable autonomous coding agent – is known for solving complex mathematical problems at the back of his hand. His brother Neal Wu, who is also building Cognition Labs is also a maths legend himself.

Demis Hasabis, the co-founder of DeepMind, has won the World Games Championships at the Mind Sports Olympiad a record five times.

Still A Long Road for AI Models

According to venture capitalist and Meta board director Peter Theil, it will take at least 3-5 years for AI systems to possess the capability to solve all problems presented in the prestigious International Mathematical Olympiad.

Barring a few AI models, a majority of them have failed in IMO tests. Answering questions correctly requires mathematical creativity that AI systems have long struggled with as put out by Microsoft engineer Shital Shah in a post on X.

GPT-4, for instance, which has shown remarkable reasoning ability in other domains, scored 0 per cent on IMO geometry questions, while specialised AIs struggle to answer as well as average contestants.

When tested, GPT-4, GPT-4o, and Claude 3.5 Sonnet all failed to solve the first IMO question correctly. While pointing out incorrect cases helped Claude 3.5 Sonnet briefly, it ultimately continued on the wrong path.

Also, while AlphaProof and AlphaGeometry 2 were able to score perfect marks in four of the six questions, in the other two they were unable to even begin working towards an answer.

Moreover, DeepMind, unlike human competitors, was given no time limit. While students get nine hours to tackle the problems, the DeepMind systems took three days working round the clock to solve one question, despite blitzing another in seconds.

Eureka Labs founder Andrej Karpathy has also proved that models exhibit puzzling inconsistencies, such as struggling with seemingly simple tasks.

AI’s journey towards excellence in mathematics is marked by both impressive breakthroughs and persistent challenges. As AI systems continue to evolve and improve, they bring us closer to the possibility of achieving gold medals in prestigious competitions like the IOI.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...