The End of Pre-Training Era Begins

We are entering a new frontier of intelligence — one where pre-training is dead. The AGI era has arrived.

“Pre-training as we know it will unquestionably end…because we have but one internet,” said OpenAI co-founder Ilya Sutskever at the NeurIPS 2024, highlighting the finite nature of data and emphasising the challenge of data scarcity.

“You could even say that data is the fossil fuel of AI. It was created somehow, and now we use it, but we’ve achieved peak data,” he added, implying the end of the Transformer-based architecture that birthed much of the GenAI boom today.

Sutsekver’s Scaling Dilemma

For context, all foundational models rely on pre-training scaling for improvements. However, recent debates highlight diminishing returns in training runs. Industry leaders like Microsoft CEO Satya Nadella and Google CEO Sundar Pichai echoed this sentiment, noting the increasing difficulty in progressing with the existing architectures.

Previously, in an interview, Sutskever told Reuters that 2010 was the age of scaling, and now we are back in the age of wonder and discovery. He also said that scaling the right thing matters more than ever.

OpenAI founding member Andrej Karpathy flagged LLMs’ lack of “thought process data,” spurring calls for synthetic data to mimic human reasoning.

Have we scraped all of the Internet yet?

The open-source community at large believes there is still room for experimentation. Qwen’s Binyuan Hui argued, “Synthetic data and post-training depend on the base model’s quality.”

For the community, pre-training remains vital until they can match the capabilities of closed-source models like those from OpenAI.

Hui added that Qwen2.5’s 18T tokens fail to cover niche and evolving information. Qwen3 will need more data, with data cleaning and quality access still major challenges.

His central argument was that the lack of critical details about the advanced pre-training models Sutskever referenced—such as token counts, parameter sizes, and performance metrics—creates opacity, hindering any clear assessment of whether pre-training has truly reached its limits.

Microsoft’s Phi-4, a 14-billion-parameter model excels at complex reasoning. Small models like these hint at a promising future for local AI applications formed via synthetic data.

Can chain-of-thought (CoT) scale to AGI?

François Chollet, who created Keras and the famous AGI-ARC benchmark, noted: “Bigger models are not all you need. You need better ideas. Now, the better ideas are finally coming into play.”

Interestingly, reasoning models would be the next leap forward besides agents. ARC Prize co-founder Mike Knoop described the o1 series as a game-changer, soaring from 18% to 32% on their benchmark in just months, unlike GPT’s five-year climb from 0% to 5%.

Now that o1 is already at the PhD level, we wonder how much progress models can make from here. OpenAI is set to release the next iteration, o3, tonight.

Google introduced its reasoning model yesterday, joining Qwen and DeepSeek, which have already launched their thinking models. Meanwhile, Meta released a report hinting at the arrival of reasoning models next year, with xAI’s Grok and Anthropic also anticipated to follow suit.

2025 is the year of Agentic AI

According to Sutskever, the path forward will likely focus on three pillars: agents, synthetic data, and inference-time computing. “The thing about superintelligence is that it will be different qualitatively from what we have.”

He envisions systems evolving from marginally agentic to truly autonomous, capable of reasoning and decision-making in dynamic and unpredictable ways. At the Axios AI Summit, Anthropic CPO Mike Krieger compared users adopting AI agents as they evolve to drivers adapting to Tesla’s self-driving mode.

“The sad part about that talk is what he didn’t say. Ten years ago, Ilya would have told us what he thinks we should do. Yesterday he just alluded to ideas from others. That’s what happens when you run a company and are more interested in secrecy than benefiting science,” said Dumitru Erhan, research director at Google DeepMind, reflecting on an uncertain future ahead.

Nonetheless, Sutskever has inspired a broader perspective, encouraging people to think beyond the limits of current possibilities.

In his talk, he discussed how biology offers examples of scaling, such as the relationship between body size and brain size in mammals. In hominids, evolutionary advancements depart from typical scaling laws, inspiring unconventional scaling in AI systems.

This reflects a larger point towards building a newer architecture for AI.

“Think of it as the iPhone, which kept getting bigger and more useful from a hardware point, but plateaued, and the focus shifted to applications,” said John Rush on X.

The post The End of Pre-Training Era Begins appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...