When OpenAI announced the ‘12 days of Shipmas,’ excitement and expectations remained high amidst the promise of groundbreaking technologies. However, as the company wrapped up on Friday and ended on a high note, quite literally, with the release of o3 models, it still ended up being a story of unmet expectations.
On day one, OpenAI essentially released a full version of the o1 model, which had already been launched. Although they introduced a $200 o1 Pro mode that requires more processing power, opinions were divided because it was not intended for everyone.
Then the trend continued. The company repackaged old features with minor updates or offered tools that competitors had already provided long before.
“It seems like they had three important releases and asked ChatGPT, ‘How can we sprinkle nine other minor features and product improvements into this so we can call it 12 days of OpenAI?” George Pickett, a San Francisco-based software engineer, remarked on X, echoing the sentiment of many users.
Old Wine in a New Bottle
During the 12 days of Shipmas, OpenAI released Sora, the text-to-video generation model, yes. But it merely fulfilled the initial promise, nothing more.
Then came the updates to ChatGPT Search. Again, there is nothing notable here—it was free for all users. ChatGPT Projects, another feature, has been available on Claude for quite some time.
Towards the end, OpenAI announced that the model would be available on WhatsApp. This is Meta’s territory, where AI bots are available for free for all WhatsApp users.
On the 11th day, OpenAI only expanded a feature that could read content from external apps and said it had plans for an agent in 2025. Still, it does nothing to rival Computer Use and Copilot Vision.
There was a silver lining, however. These demos involved OpenAI’s engineers and researchers playing with it to accomplish fun, festive-themed tasks.
For instance, many would disapprove of dedicating an entire live stream just to announce that ChatGPT was available on the iPhone.
Demonstrating how to use the feature, down to the basics, could prove pivotal for OpenAI to onboard newer users to use these features.
Yet, they were not groundbreaking enough to draw significant praise from the AI community, at least.
A Fortnight is Too Long to Whip Up a Meal
Amidst all this fanfare and festive drama, Google, OpenAI’s biggest competitor, quickly responded. For good and bad reasons, they took an approach that was contrary to OpenAI: no fortnight-long events.
On the day they released Project Mariner, Gemini 2.0, and an improved Project Astra, the company announced the features in a long blog post with demo videos hidden inside.
But that was enough to have a huge impact. The conversation began to shift, and pundits wondered if Google was ‘crushing’ OpenAI. Take Google’s latest video model, Veo 2.
Internal testing showed that Veo surpassed rivals such as Kling, Meta’s Moviegen, and OpenAI’s Sora in quality and adherence to prompts.
Jonas Adler, a Google Deepmind researcher, said, “OpenAI has always had a good counter to anything we ship, magically always on the same day. But I’m not very impressed with Santa mode as a counter to Gemini 2.0; it doesn’t quite have the same gravitas.”
OpenAI used a 20-minute demo to announce its flagship model, while Sundar Pichai, CEO at Google only needed 180 characters. He simply took to X and announced that the Gemini Advanced subscribers could try out the Gemini-exp-1206 model. They even released Gemini 2.0 Flash Thinking, an advanced reasoning model.
Meanwhile, Anthropic, another AI company in the race, discovered that their AI, specifically Claude 3 Opus, could pretend to follow new rules while secretly sticking to its old ones.
The competition showed that one doesn’t need bells and whistles to announce new developments.
‘I don’t believe o3 is AGI’
OpenAI, on days with less exciting announcements, subtly included references to AGI (Artificial General Intelligence) on demo screens.
For example, when OpenAI announced the ChatGPT integration for the iPhone, a calendar event titled “Super Secret AGI” was spotted. Greg Brockman, president at OpenAI, midway through OpenAI’s 12 days, posted on X that “Agi is in the air.”
In some ways, the company set expectations to officially announce AGI at the end of 12 days. However, on D-day, when everyone expected OpenAI to deliver, the company announced the o3 series of models.
Indeed, o3 is a monumental feat. Once out, it is going to be a great model, given how several researchers are in awe of the o1 model.
The hype around o3 is out of control.
It’s not AGI, it’s not the singularity, and you definitely don’t have to change your worldview.
In fact, the public doesn’t even have access to the models so how can anyone claim any of the above.
I appreciate how the OpenAI researchers…— elvis (@omarsar0) December 21, 2024
While OpenAI did not explicitly mention that o3 is ‘AGI,’ the company tested o3 models using the ARC-AGI benchmark.
François Chollet, creator of Keras and a former Google researcher, built the ARC-AGI benchmark and said that it is “the only AI benchmark that measures progress towards general intelligence.”
The creators of the benchmark also said, “If found, a solution to ARC-AGI would be more impactful than the discovery of the transformer. The solution would open up a new branch of technology.”
The O3 model scored almost 90% on the benchmark, exceeding human performance. However, Chollet was dissatisfied. On X, he stated, “I don’t believe this is AGI—there are still easy ARC-AGI-1 tasks that O3 can’t solve.”
He also revealed that there is evidence that o3 will struggle when tested on the next iteration, the ARC-AGI 2 benchmarks.
Besides the debate about whether solving ARC-AGI is the real deal, it seems that the O3 has not exactly solved the harder challenge either.
A harder ARC-AGI test requires solving private problems that cannot be found in any datasets exposed to the model.
However, the o3 model’s high score was achieved on a ‘semi-private’ problem set.
That said, ARC-AGI isn’t currently allowing AI models to test on the private evaluation set to prevent data leakage.
For what it’s worth, it has worked in OpenAI’s favour. The internet is going gaga over the model. Another benchmark that o3 was tested on is the Frontier Math benchmark, which earlier revealed that leading models could only solve 2% of the problems, but o3 managed to score 25%.
Results derived from internal testing also raised concerns. “Not one person outside of OpenAI has evaluated o3’s robustness across different types of problems,” said Gary Marcus, a scientist and researcher who is vocal about AI and cognitive psychology.
So, What’s Next?
OpenAI has yet to officially announce ‘AGI,’ likely because it could negatively impact Microsoft. If the company declares AGI, Microsoft will lose access to OpenAI’s models.
Reports indicate that OpenAI is considering eliminating the clause. As o3 approaches fine-tuning and a formal launch, will OpenAI declare AGI?
After a disappointing 12 days of OpenAI, albeit the last announcement, the internet has already started speculating.
Sam Altman, the hype master, was at it again without any short breaks.
Delivering a speech at the 2024 FinRegLab AI Symposium, Altman said, “By the end of 2025, I expect we will have systems that can do truly astonishing cognitive tasks, like where you’ll use it and be like, that thing is smarter than me at a lot of hard problems.”
AGI-1
— Sam Altman (@sama) December 21, 2024
However, he noted that the term ‘AGI’ may not hold significant meaning anymore, stating that it has become less useful. Nonetheless, OpenAI will continue to pursue the five levels of AI as intended. The ultimate level involves organisations that can carry out all the functions of a company autonomously without human participation.
The post Sam Altman Turns a Hype Master appeared first on Analytics India Magazine.