In March, OpenAI launched GPT-4 to great fanfare, but a dark cloud loomed on the horizon. Scientists and AI enthusiasts alike panned the company for not releasing any specifics about the model, like the parameter size or architecture. However, a top AI researcher has speculated on the inner workings of GPT-4, revealing why OpenAI chose to hide this information; it’s disappointing.
OpenAI CEO Sam Altman famously stated on GPT-4 that “people are begging to be disappointed, and they will be”, speaking about the potential size of the model. Indeed, rumours abound ahead of the model’s launch that it will have trillions of parameters and be the best thing that the world has ever seen. However, the reality is much more plain than it seems. In the process of making GPT-4 better than GPT-3.5, OpenAI might have bitten off more than it can chew.
8 GPTs in a trenchcoat
George Hotz, world-renowned hacker and software engineer, recently appeared on a podcast to speculate about the architectural nature of GPT-4. Hotz stated that the model might be a set of eight distinct models, each featuring 220 billion parameters. This speculative statement was later confirmed to be true by Soumith Chintala, the co-founder of PyTorch.
While this puts the parameter count of GPT-4 at 1.76 trillion, the notable part is that all of these models don’t work at the same time. Instead, they are deployed in a mixture of expert architecture.
This architecture makes each model into different components, also known as expert models. Each of these models is fine-tuned for a specific purpose or field, and is able to provide better responses for that specific field. Then, all of the expert models work together, with the complete model drawing on the collective intelligence of the expert models.
This approach has many benefits. One is that of more accurate responses due to models being fine-tuned on various subject matters. MoE architecture also lends itself to being easily updated, as the maintainers of the model can improve it in a modular fashion, as opposed to updating a monolithic model.
Hotz also speculated that the model may be relying on the process of iterative inference for better outputs. Through this process, the output, or inference result of the model, is refined through multiple iterations.
This method also might allow GPT-4 to get inputs from each of its expert models, which could reduce the amount of hallucinations by the model. Hotz stated that this process might be done 16 times, which would vastly increase the operating cost of the model.
This approach has been likened to the old trope of 3 children in a trenchcoat masquerading as an adult. Many have likened GPT-4 to be 8 GPT-3’s in a trench coat, trying to pull the wool over the world’s eyes.
Cutting corners
While GPT-4 aced benchmarks that GPT-3 has had difficulties with, the MoE architecture seems to have become a pain point for OpenAI. In a now-deleted interview, Altman admitted to the scaling issues OpenAI is facing, especially in terms of GPU shortages.
Running inference 16 times on a model with MoE architecture is sure to increase cloud costs on a similar scale. When blown up to ChatGPT’s millions of users, it’s no surprise that even Azure’s supercomputer fell short of power. This seems to be one of the biggest problems that OpenAI is facing currently, with Sam Altman stating that cheaper and faster GPT-4 is the company’s top priority as of now.
This has also resulted in a reported degradation of quality in ChatGPT’s output. All over the Internet, users have reported that the quality of even ChatGPT Plus’ responses have gone down. We found a release note for ChatGPT that seems to confirm this, which stated, “We’ve updated performance of the ChatGPT model on our free plan in order to serve more users.”. In the same note, OpenAI also informed users that Plus users would be defaulted to the “Turbo” variant of the model, which has been optimised for inference speed.
API users, on the other hand, seem to have avoided this problem altogether. Reddit users have noticed that other products which use the OpenAI API provide better answers to their queries than even ChatGPT Plus. This might be because users of the OpenAI API are lower in volume when compared to ChatGPT users, resulting in OpenAI cutting costs at ChatGPT while ignoring the API.
In a mad rush to get GPT-4 out to the market, it seems that OpenAI has cut corners. While the purported MoE model is a good step forward for making the GPT series more performant, the scaling issues that it is facing show that the company might just have bitten off more than it can chew.
The post The Truth Behind OpenAI’s Silence On GPT-4 appeared first on Analytics India Magazine.