Why Time is Ripe for the ‘Real’ GPT-4

We had previously asked the question “What happened to multimodal-GPT-4.” Six months later, it appears that Google’s Gemini has compelled OpenAI to strongly consider expediting the release of GPT-4 with multimodal capabilities. According to reports, Google will be releasing Gemini anytime soon and OpenAI has to buckle up.

OpenAI is currently in the process of integrating GPT-4 with multimodal capabilities, much like what Google is planning with Gemini. This integrated model is expected to be named GPT-Vision, as per a recent report. The timing appears to be quite opportune, as both Gemini and GPT-Vision are expected to enter the scene and potentially compete against each other this fall.

Although Sam Altman had earlier made it clear that one shouldn’t expect GPT-5 or GPT- 4.5 in the near future, however, as per the Information article, OpenAI might follow up GPT-Vision with an even more powerful multimodal model, codenamed Gobi. Unlike GPT-4, Gobi is being designed as multimodal from the start.

It needs to be seen if OpenAI makes the right decision by clashing with Gemini. Many are eagerly anticipating that OpenAI may introduce a multimodal GPT-4 during their first-ever developer’s conference. OpenAI DevDay is set to take place on November 6th in San Francisco.

Fingers crossed for multimodal GPT-4! https://t.co/6PWadNnmsj

— monarchwadia (@monarchwadia) September 10, 2023

Is GPT-Vision better than Gemini?

OpenAI’s decision to withhold the multimodal capabilities may not stem from an inability to develop them. ChatGPT creator has collaborated with a startup called Be My Eyes, which is developing an app to describe images to the blind users, helping them interpret their surroundings so that they can interact with the world more independently.

During this collaboration, OpenAI recognised that adding multimodal capabilities to GPT-4 at this stage might be premature as the integration of images could potentially raise privacy concerns. Moreover, there’s a risk of misinterpreting facial features such as gender or emotional state, which could result in harmful or inappropriate responses.

Meanwhile, OpenAI has got its bases covered. Few months earlier reports came out that OpenAI is working on Dall E-3. Early samples leaked by YouTuber MattVidPro indicate that this model did much better than other image generators, including Midjourney, which is usually seen as the best for making realistic images.

Interestingly, in a recent interview, Google’s chief Sundar Pichai, when asked what edge Gemini have over ChatGPT, he replied, “Today you have separate text models and image-generation models and so on. With Gemini, these will converge.” This means that the most we can anticipate from Gemini is its ability to generate text and images based on user prompts.

If OpenAI combines the capabilities of Dall E-3 and ChatGPT Plus, it is pretty much good to go against Gemini.

To have an edge over GPT-4, Gemini is being trained on YouTube videos and would be the first multi-modal model being trained on video rather than just text (or in GPT-4’s case text plus images). Moreover, Demis Hassabis recently claimed that engineers at DeepMind are using techniques from AlphaGo for Gemini

On the other hand, Google’s Bard hasn’t been able to make a strong impression and falls short of ChatGPT when it comes to generating text. Thus, placing hope on Gemini to turn Google’s fortunes around is a huge bet.

OpenAI can afford to risk it

OpenAI’s process of shipping products is different from that of Google. Google, being an old and reputable player in the market with 4.3 billion customers worldwide, thinks twice, or even more times, before launching any product. It has to make sure that its products are fully furnished and do not have any loose ends.

On the other hand, OpenAI has shipped products in the past, even though they are not fully finished in the hope that consumer reviews will help them in making the necessary changes.

Consider the example of GPT-4. When OpenAI initially introduced it , they mentioned it would be multimodal. However, this didn’t turn out to be the case. Moreover, OpenAI openly acknowledged the limitations of GPT-4, stating that it still isn’t entirely dependable, often generating inaccurate information and making reasoning errors.

Pichai expressed similar views during a recent interview when he noted that ChatGPT’s launch before LaMDA signaled Google that the technology of LLMs is well-suited for the market.

He stated, “credit to OpenAI for the launch of ChatGPT, which showed a product-market fit and that people are ready to understand and play with the technology”.

It would be safe to say that with both Google and OpenAI striving to take the lead in the multimodal war, this fall surely has become more interesting.

The post Why Time is Ripe for the ‘Real’ GPT-4 appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...