Did Microsoft Spill the Secrets of OpenAI?

Microsoft suggests that OpenAI’s o1 Mini and GPT 4o Mini consist of 100 billion and 8 billion parameters, respectively.

A new research by Microsoft estimates the size of some of the most powerful AI models that exist today – a feature that is otherwise kept a secret. Microsoft suggests that Claude 3.5 Sonnet consists of 175 billion parameters, and the o1 Preview has 300 billion parameters.

The tech company also suggests that OpenAI’s small models, the o1 Mini and the GPT 4o Mini, consist of 100 billion and 8 billion parameters, respectively.

This has got people excited. GPT-4o Mini is a powerful model from OpenAI that ranks higher than the larger GPT-4o, Claude 3.5 Haiku, and is comparable to the latest Llama 3.3 70B, according to a quality index from Artificial Analysis.

An 8B parameter model has the potential to be embedded into portable devices for local use. In a post on X, Yuchen Jin, CTO of Hyperbolic Labs, asked OpenAI chief Sam Altman, “Would you consider open-sourcing GPT-4o-mini? It could run on our local devices.”

However, some speculate that the GPT-4o mini, like the GPT-4o, is a ‘mixture of experts (MoE) model’ that uses a small and specialised model within itself to solve different parts of a problem.

Oscar Le, CEO of SnapEdit, one of the most popular AI photo editing apps, said, “My bet is 4o-mini is an MoE with a total of around 40B params (parameters) and probably 8B active.”

“I saw that it holds significantly more knowledge (when asking about facts) than an 8B model while being quite fast. Besides, GPT-4o is MoE, so they likely use the same architecture for mini,” he added in a post on X.

Microsoft experimented with the above models in their research to develop a benchmark for medical error detection and correction in clinical notes. However, this isn’t an exact number of the parameter count.

“The exact numbers of parameters of several LLMs have not been publicly disclosed yet. Most numbers of parameters are estimate reported to provide more context for understanding the models’ performance,” Microsoft said in the research.

OpenAI, Anthropic, and Google have not released a detailed technical report outlining the architectural details and techniques used to build their latest models. This is likely due to concerns about revealing proprietary technology. For context, GPT-4, released in 2023, was the last model from OpenAI to carry a technical report.

However, companies like Microsoft and Chinese AI giants Alibaba’s Qwen and DeepSeek have released detailed technical documentation of their models. Recently, Microsoft’s Phi-4 models released all the details of the model.

In an interview with AIM, Harkirat Behl, one of the creators of Microsoft’s Phi-4 models, said that the company is taking a different approach from OpenAI’s or Google’s. “We have actually even given all the secret recipes [of the model] and techniques which are very complicated, and nobody in the world has implemented these techniques.”

“In the paper, we have released all those details. That’s how much we love open source here at Microsoft,” Behl added.

‘Bigger Models are Not All You Need’

Over the last few years, the parameter count of AI models has been trending downward, and the latest revelation substantiates this trend. Last year, EpochAI unveiled the parameters of multiple frontier models, like GPT 4o and Claude 3.5 Sonnet.

After Microsoft, EpochAI also revealed that GPT-4o has 200 billion parameters. EpochAI said that 3.5 Sonnet has around 400 billion parameters, a stark contrast to Microsoft’s estimate of 175 billion parameters. Irrespective, this suggests that AI models are done prioritising the parameter count.

Between GPT-1 and GPT-3, parameter counts were multiplied by 1,000 times, and another 10 times from 175 billion to 1.8 trillion parameters between GPT-3 and GPT-4. The multiplication factor, however, is being reversed.

“Let alone reaching the 10 trillion parameter mark, current frontier models such as the original GPT-4o and Claude 3.5 Sonnet are probably an order of magnitude smaller than GPT-4,” said Ege Erdil, a researcher at EpochAI in December last year.

chart visualization

Initially, increasing the parameter size improved the model’s performance. However, with time, increasing computation and parameter size did not scale the model further. The lack of availability of newer datasets also contributes to these diminishing returns.

“A model with more parameters is not necessarily better. It’s generally more expensive to run and requires more RAM than a single GPU card can have,” said Yann LeCun in a post on X.

Owing to this, engineers have explored efficient techniques on an architecture level to scale models. One such technique is the MoE, which the GPT-4o and the 4o Mini reportedly operate with.

“[MoE is a] neural net consisting of multiple specialised modules, only one of which is run on any particular prompt. So the effective number of parameters used at any one time is smaller than the total number,” LeCun further said.

As 2024 came to an end, the ecosystem witnessed models with innovative techniques that outperformed frontier models. Released in December, Microsoft’s Phi-4 uses small and curated high-quality datasets to train the Phi-4 model. These outperform many leading models, including GPT-4o.

Just a fortnight ago, DeepSeek released an open-source MoE model, the V3. Not only does it outperform GPT-4o in most tests, but it was also trained for just $5.576 million. For instance, GPT-4 was trained for $40 million, and Gemini Ultra took $30 million.

So, in 2025, we are likely to see more optimisation and scaling techniques that boost models to higher levels at significantly lower costs.

“Model size is stagnating or even decreasing, while researchers are now looking at the right problems – either test-time training or neurosymbolic approaches like test-time search, program synthesis, and symbolic tool use,” François Chollet, creator of Keras and the ARC AGI benchmark, wrote on X.

“Bigger models are not all you need. You need better ideas. Now the better ideas are finally coming into play,” he added.

The post Did Microsoft Spill the Secrets of OpenAI? appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...