Large language models (LLMs), relying on vast amounts of data and computational power, have been at the forefront of AI progress lately. However, research suggests that the AI industry’s widespread belief that more data and compute equal more progress may be misguided.
While companies have been aggressively collecting data to train LLMs, the law of diminishing returns indicates that pursuing model gains through scale alone may become economically infeasible.
Databricks CTO Matei Zaharia, in a recent interview with AIM, echoed similar sentiments, stating that, “Whenever you double your training costs, you increase quality by only around 1% or something like that.”
“Beyond the scaling law itself, there’s also the fact that we kind of put in all the data on the internet in these models already,” he added.
Zaharia further explained, “Even if you repeat it and add more data, maybe you can modify it and create variants of it. However, it’s not clear if it would be that much more informative.”
He acknowledged that while researchers will continue to work on making the process more efficient and scaling higher, “it is possible that we’re kind of maxing out the general consumer thing”.
Diverging Opinions on Scaling Limits
Microsoft CTO Kevin Scott, on the other hand, believes that AI models can continue to become more powerful with increased computing scale. “We are nowhere near the point of diminishing marginal returns on how powerful we can make AI models as we increase the scale of compute,” he said.
However, this view is not universally acknowledged.
Matei Zaharia, Gary Marcus, and Yann LeCun have expressed doubts about the sustainability of this approach. They question the speed at which existing infrastructure can be scaled to support ever-larger AI models.
This concern is not merely theoretical but grounded in reality with indicators like Data Center Physical Infrastructure (DCPI) revenue growth slowing down in the first quarter of 2024.
This slowdown is attributed to design shifts aimed at supporting accelerated computing infrastructure of AI workloads that need more time to materialise.
However, like their close ally Microsoft, OpenAI CEO Sam Altman has also previously stated, “We can say with a high degree of scientific certainty that GPT 5 is going to be a lot smarter than GPT 4. GPT 6 is going to be a lot smarter than GPT 5. And we are not near the top of this curve.”
Energy Constraints and Data Center Challenges
On the contrary, Meta CEO Mark Zuckerberg highlighted energy constraints as the most significant factor limiting AI growth, with data centres consuming vast amounts of energy.
Estimates suggest that by 2030, data centres’ power consumption will reach 848 terawatt-hours (TWh), nearly doubling from the current 460 TWh. To put this into perspective, in 2021, India with a population of over a billion people consumed a total of 1,443 TWh of electricity.
Zuckerberg also discussed the challenges of planning around exponential growth in AI, stating, “When you have an exponential curve, how long does it keep going for?”
He believes it is likely that the current exponential growth in AI will continue, making it worthwhile for companies to invest tens or even hundreds of billions of dollars in building the necessary infrastructure.
However, he also acknowledges that no one in the industry can say with certainty that this growth rate will be maintained indefinitely.
Potential Solutions and Future Directions
Despite the challenges, there’s still room for improvement to sustain this generative AI wave. Zaharia believes that there is still untapped potential in domain-specific AI applications.
He emphasised that “most enterprise use cases are building a multi-step thing”, which he referred to as “compound AI systems”. Engineering these systems is a complex task, and “there’s a lot of research to be done, like how to best design it”.
Similarly, Meta’s AI chief Yann LeCun has spoken about an “objective-driven AI” architecture, since Auto-Regressive LLMs scaling is giving diminishing returns.
As I’ve said repeatedly, a new architecture will emerge for the next qualitative jump in capabilities,” he stated.
Along similar lines, Databricks is focusing on helping people get the best quality possible in their domain for their GenAI applications.
This would be done by building compound AI systems that involve multiple components, such as calls to different models, retrieval of relevant data, use of external APIs and databases, and breaking problems into smaller steps.
At the same time, Databricks is also focusing on open-source models.
As the AI industry navigates the law of diminishing returns, collaboration between data centre operators, utilities, and policymakers will be crucial to ensure a reliable and sustainable power supply while accommodating the growing needs of AI.
The future of AI progress lies in finding innovative solutions and architectures that can overcome the limitations of the current approach.