Google Unveils Gemini 2.0 Flash Thinking, Challenges OpenAI o1

Looks like Google just played AGI reverse card on OpenAI with the release of Gemini 2.0 Flash Thinking. This new model comes with advanced reasoning capabilities, alongside showcasing its thoughts. Logan Kilpatrick, Google’s product lead, Gemini 2.0 said that it “unlocks stronger reasoning capabilities and shows its thoughts.”

He said that the model can “solve complex problems with Flash speeds” while displaying its internal planning process, allowing for greater transparency in AI problem-solving.

The experimental model is still in its early stages, but Kilpatrick provided an example of its potential, showcasing how it can tackle a challenging puzzle involving both visual and textual clues.

Developers can try the model out today in Google AI Studio and the Gemini API. “This is just the first step in our reasoning journey, excited to see what you all think!” said Kilpatrick.

“Google giving away a reasoning model for free to users in AIStudio is simply to show their power. They’re back,” commented a user on X.

“Our most thoughtful model yet:),” posted Google chief Sundar Pichai on X.
“We’ve been *thinking* about how to improve model reasoning and explainability,” said Noam Shazeer, VP Engineering, Gemini co-lead at Google.

Breaking news from Chatbot Arena⚡🤔@GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories!
The leap from Gemini-2.0-Flash:
– Overall: #3 → #1
– Overall (Style Control): #4 → #1
– Math: #2 → #1
– Creative Writing: #2 → #1
– Hard Prompts: #1 → #1… https://t.co/lO1DiTiOOj pic.twitter.com/cq2MRMbWZ1

— lmarena.ai (formerly lmsys.org) (@lmarena_ai) December 19, 2024

Gemini 2.0 Flash Thinking, which builds on Google’s Gemini series, is set to compete with OpenAI’s o1 model, known for its impressive reasoning capabilities at a level similar to PhD students in physics, chemistry, and biology.

Google recently launched Gemini 2.0 Flash, which supports multimodal inputs, including images, video, and audio, as well as multimodal outputs such as natively generated images combined with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, execute code, and integrate third-party, user-defined functions.

This development comes against the backdrop of OpenAI releasing the full version of the o1 model as part of its 12 days of shipmas. Besides this, it also released the o1 model in the API, upgraded with function calling, structured outputs, reasoning effort controls, developer messages, and vision inputs.

A few benchmarks have been suggesting that o1 is the most powerful AI model yet and even outperforms the Claude 3.5 Sonnet in coding tasks.

O1 BLOWS EVERYONE ELSE AWAY – IT IS A BEAST IN REASONING AND IS ALSO THE BEST IN CODING!!
The new o1 model from 12/17 is #1 on Livebench AI and scores 91.58 in Reasoning!!
Finally, OpenAI also BEATS Sonnet in Coding. 🤯
Pretty much the only drawback to o1 is that it takes too… pic.twitter.com/roMDMPckCR

— Bindu Reddy (@bindureddy) December 19, 2024

Google Won’t Let OpenAI Win

Google now seems to be ahead in the AGI race, while OpenAI is now playing catchup. On the 11th day of ‘12 Days of OpenAI,’ the startup announced an update to the ChatGPT desktop application for Mac. The announcement came from John Nastos and Justin Rushing of OpenAI’s ChatGPT desktop team, besides the former playing the saxophone.

Nastos described the native app as ‘lightweight’ and easy to use without disrupting ongoing tasks. A standout feature of the app is its seamless integration with various applications on the user’s computer, making it easier to interact with multiple tools directly from ChatGPT.

“Our desktop app can now work with apps like Xcode, Warp, Notion, Apple and ~30 more. ChatGPT can see, understand, and automate your work in other apps—a step along the path to a more agentic ChatGPT,” said OpenAI chief product officer, Kevin Weil.

“We all copy and paste things into ChatGPT all the time,” Rushing said. “This feature makes that way smoother by automatically pulling context from the apps you’re working with, so you can focus on asking your question, and we’ll handle the rest.”

The app’s utility extends to coding tasks. Nastos demonstrated its ability to integrate with IDEs like Xcode, showcasing how ChatGPT can assist with live coding challenges.

One of the app’s standout features is voice interaction, enabling users to communicate directly with ChatGPT through an advanced voice mode for faster and more natural conversations.

With only one day remaining in OpenAI Shipmas, everyone is eagerly anticipating what OpenAI will unveil next to wrap up 12 days of nonstop shipping. However, so far, Google has countered every move made by OpenAI.

While OpenAI has been making announcements during its ‘12 Days of OpenAI,’ Google has introduced its own series of innovations, including the quantum chip Willow, Gemini 2, the 3D world model Genie 2, the Veo 2 video generation model, Project Astra as a universal agent, Project Mariner, Google Deep Research, and Android XR for AR/VR development.”

On the other hand, OpenAI has unveiled several significant updates, including the improved OpenAI o1 reasoning model, a new $200-per-month ChatGPT Pro subscription, and Sora, their text-to-video AI generator.

Other notable releases include ChatGPT Search for all users, a new Projects feature for organizing chats, Canvas for collaborative writing and coding, and real-time video capabilities for ChatGPT.
Moreover, OpenAI launched a range of new capabilities, such as Advanced Voice Mode with a Santa Claus voice option, a 1-800 number to call ChatGPT from landline, and ChatGPT’s integration with Apple Intelligence.

The post Google Unveils Gemini 2.0 Flash Thinking, Challenges OpenAI o1 appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...