OpenAI chief Sam Altman thinks otherwise. “There is no wall,” he said.
“Scaling the right thing matters more now than ever,” said former OpenAI co-founder and Safe Superintelligence (SSI) founder Ilya Sutskever in an interview with Reuters. He’s reportedly working on an alternative approach to scale LLMs, and eventually build safe superintelligence. Sutskever believes that the 2010s were the age of scaling, now they’re back in the age of wonder and discovery.
“Some people can work really long hours and just go down the same path faster. It’s not so much our style. But if you do something different, then it becomes possible for you to do something special,” he said.
Based on his academic and research interests, it is most likely that Sutskever is advancing AGI by scaling transformer architectures with a focus on reinforcement learning and self-supervised methods, which support models in learning from vast data with minimal human guidance and increase their adaptability to complex tasks.
OpenAI, more or less, is also treading on a similar path. To tackle the scaling challenge the company plans to scale test-time compute and utilise the high-quality synthetic data generated by previous models.
OpenAI reportedly uses Strawberry (o1) to generate synthetic data for GPT-5. This sets up a “recursive improvement cycle”, where each GPT version (say, GPT-5 or GPT-6) will be trained on higher-quality synthetic data created by the previous model.
Another former OpenAI co-founder and founder of Eureka Labs, Andrej Karpathy, also highlighted that LLMs lack thought process data, noting that current data is mostly fragmented information. He believes that enough high-quality thought process data can help in achieving AGI.
“The big one, I think, is the present lack of “cognitive self-knowledge”, which requires more sophisticated approaches in model post-training instead of the naive “imitate human labelers and make it big” solutions that have mostly gotten us this far,” said Karpathy, while coining the term jagged intelligence.
All of these developments come on the back of reports indicating that traditional scaling may be reaching its limits, with Gemini 2.0 and Anthropic’s Opus 3.5 rumoured to underperform despite scaling efforts. The emphasis is shifting to quality synthetic data and scaling test-time compute.
‘I Told You So’
Meta’s chief AI scientist, Yann LeCun, couldn’t resist joining in to criticise OpenAI’s new approach. “I don’t want to say ‘I told you so’, but I told you so!” he said, adding that Meta has been working on ‘the next thing’ for a while now at FAIR.
Meta Bets on Autonomous Machine Intelligence
Earlier this year, Meta threw its hat in the ring in the pursuit of AGI by merging two major AI research efforts, FAIR and the GenAI team.
Under the guidance of LeCun, the company is developing a ‘world model’ with reasoning capabilities akin to those of humans and animals, which LeCun dubs AMI (autonomous machine intelligence), aka ‘friend’ in French.
Earlier this year, Meta released a new AI model called Video Joint Embedding Predictive Architecture (V-JEPA). It enhances machines’ understanding of the world by analysing interactions between objects in videos. Last month, the company introduced several advanced models, including Segment Anything Model (SAM) 2.1, Meta Spirit LM, Layer Skip, SALSA, and Meta Lingua.
Interestingly, Layer Skip optimises the performance of LLMs by selectively executing layers and verifying outputs. This end-to-end solution accelerates LLM generation times on new data without the need for specialised hardware or software.
Besides this, Meta plans to launch Llama 4 early next year. Meta said that it leverages self-supervised learning (SSL) during its training to help Llama learn broad representations of data across domains, which allows for flexibility in general knowledge.
RLHF (reinforcement learning with human feedback), which currently powers GPT-4o and a majority of other models, focuses on refining behaviour for specific tasks, ensuring that the model not only understands data but also aligns with practical applications. But, now OpenAI and others, seem to be walking the path of Meta’s deep-learning school of thought.
Meta recently also launched ‘Self-Taught Evaluator’, which can assess the performance of other models. It employs the chain of thought technique, breaking down complex problems into smaller, logical steps to improve accuracy in fields like science, coding, and mathematics.
LeCun was right all along when he said auto-regressive LLMs are hitting a performance ceiling. “I’ve always said that LLMs were useful but were an off-ramp on the road towards human-level AI. I’ve said that reaching human-level AI will require new architectures and new paradigms,” he recently clarified to Gary Marcus.
Meanwhile, OpenAI Clone…
Anthropic chief Dario Amodei, in a recent interview, discussed the various approaches to scaling, including the use of synthetic data coupled with reinforcement learning.
However, he expressed scepticism about this method. “We’ll overcome the data limitation, or there may be other sources of data available, but we could also observe that even if there’s no problem with data, as we start to scale models up, they just stop getting better,” he said.
He also spoke about OpenAI’s o1 approach, saying, “The other direction, of course, is these reasoning models that do the chain of thought and stop to think and reflect on their own thinking.”
Surprisingly, taking a leaf out of OpenAI’s book, Anthropic recently added a new prompt improver to the Anthropic Console. It will take an existing prompt and Claude will automatically refine it with prompt engineering techniques like chain-of-thought reasoning.
He believes the solution to this problem lies in finding a new architecture.
“There have been problems in the past with, say, the numerical stability of models, where it looked like things were levelling off, but, you know, when we found the right unblocker, they didn’t end up doing so,” said Amodei, adding that there might be a new optimisation technique or a new technique to unblock things.
“I’ve seen no evidence of that so far, but if things were to slow down, that could perhaps be one reason,” he added. It appears that Anthropic currently plans to scale its compute. Amodei estimates that around $1 billion per AI company will be spent on compute this year, around $10 billion in 2025, and $100 billion in 2026.
The question remains when Anthropic will have its o1 moment. Amodei revealed that the company will soon release Claude 3.5 Opus and is also progressing on Claude 4.
Anthropic recently published a blog titled ‘Mapping the Mind of a Large Language Model’, which explains that LLMs can make analogies, recognize patterns, and even exhibit reasoning abilities by showing how features can be activated to manipulate responses.
The researchers employed a technique called ‘dictionary learning’, borrowed from classical machine learning, which isolates patterns of neuron activations (called features) that recur across different contexts.
Scaling Beyond Scaling
Google DeepMind chief Demis Hassabis, in an interview earlier this year, explained that the research lab is focused on more than just scaling. “Half our efforts have to do with inventing the next architectures and the next algorithms that will be needed, knowing that larger and larger scaled models are coming down the line,” he said.
He further added that Google’s upcoming models, including Gemini 2, will become multimodal. “As we start ingesting things like video and audiovisual data, as well as text data, the system starts correlating those things together,” said Hassabis. Unlike Altman, Hassabis expects the AGI to come within the next decade.
However, a recent report indicates that this approach isn’t working for Google, as Gemini, despite increased computing power and extensive training data from online text and images, didn’t meet the performance gains its leaders anticipated.
He explained that their systems will begin to understand the physics of the real world better. “One could imagine the active version of that as a very realistic simulation or game environment where you’re starting to learn about what your actions do in the world and how that affects the world itself,” he added.
Citing the example of AlphaGo and AlphaZero, he said these use RL agents that learn by interacting with an environment. The agent makes decisions, receives feedback (usually in the form of rewards or penalties), and adjusts its actions based on that feedback.
In August, Google DeepMind published a paper titled Scaling LLM Test-Time Compute Optimally Can Be More Effective than Scaling Model Parameters similar to OpenAI’s o1 strategy. The paper found that applying a compute-optimal scaling approach can improve test-time compute efficiency by 2-4x.
It also showed that, when comparing additional test-time compute to pre-training compute in a FLOPs-matched setting, simple methods like revisions and search can significantly improve certain prompts, outperforming the gains from pre-training.
Meanwhile, DeepMind is also betting on the neuro-symbolic approach. Its models, AlphaProof and AlphaGeometry, recently won a Silver Medal at the International Maths Olympiad. Many believe neuro-symbolic AI could help prevent the generative AI bubble from bursting.
In Conclusion
The path to AGI is fascinating, where scaling alone won’t lead the way. From OpenAI’s compute-heavy methods to Meta’s human-like reasoning and DeepMind’s neuro-symbolic models, each step takes us closer to a future where these models will truly understand and maybe even surpass our intelligence.
The post LLMs Have Hit a Wall appeared first on Analytics India Magazine.