Google took a sly dig at OpenAI, making it quite evident that it is making AI helpful for everyone, not just for him or her. At Google I/O, it announced a slew of announcements, including major updates to its Gemini AI models, a new multimodal AI assistant called Project Astra, an advanced image and video generation model like Imagen 3 and Veo, and AI-powered features across its products and more.
This was one of Google’s longest announcements—it was extended by an hour and mentioned AI ‘120 times,’ as revealed by Sundar Pichai, saving a few minutes for many of us.
“We want everyone to benefit from what Gemini can do, so we’ve worked quickly to share these advances with all of you,” Pichai vividly added to his walk-out video, which mentioned that “it wasn’t all just for him, or for her. It was for everyone.”
Gemini 1.5 Flash vs GPT-4o
The star of the show was Gemini 1.5 Flash, an alternative to OpenAI GPT-4o, which was announced just yesterday.
Gemini 1.5 Flash is a new lightweight and efficient AI model optimised for high-volume, low-latency tasks. “We listened to developer feedback and created something faster and more cost-effective,” said Demis Hassabis, CEO of Google DeepMind.
Gemini 1.5 Flash offers similar power to Gemini 1.5 Pro while being up to 10 times faster. Both models feature an impressive 1 million token context window, the longest among large-scale foundation models.
“Gemini 1.5 Flash is designed for quick tasks where speed and efficiency matter most, while 1.5 Pro excels at complex tasks needing the highest quality responses,” explained Josh Woodward, senior director of product management at Google Labs.
Developers have been using the expanded context window in innovative ways, from building searchable video databases to analysing lengthy research papers.
Interestingly, many developers were quick to point out that the Gemini 1.5 Flash is 7% the cost of GPT-4o (1/10th the cost of Pro).
Google $GOOGL just announced the pricing of its newest Generative AI models
Here's how the pricing compares to OpenAI's newest GPT-4o model announced yesterday via The Vergepic.twitter.com/4sQc0Pompm
Gemini Flash – $0.35 for 1 Million tokens
Gemini 1.5 Pro – $3.50 per 1M tokens…— Evan (@StockMKTNewz) May 14, 2024
Project Astra, Google’s Quick Response to OpenAI’s GPT-4o
Google also gave a sneak peek at Project Astra, its ambitious initiative to build a “universal AI agent for everyday life”. Astra is a real-time, multimodal AI assistant that can understand the world through a phone’s camera or smart glasses.
In a demo, Astra identified objects, analysed code, and even located misplaced smart glasses *wink wink* —is Google also coming after Meta’s AI glasses?
“Imagine agents that can see and hear what we do, better understand the context we’re in, and respond quickly in conversation,” said Hassabis.
“Sir Demis Hassabis just showed a super low latency demo of Google’s multimodal AI assistant on your phone AND augmented reality glasses,” read a comment from an user speculating Google has been cooking this for a while now.
Project Astra leverages Gemini’s multimodal capabilities to process video frames continuously, combine video and speech into an event timeline, and cache it for quick recall. The agents also feature enhanced intonation for more natural-sounding responses. “It’s amazing to see how far AI has come, especially in spatial understanding, video processing, and memory,” noted Vinyals. Some of these agent capabilities will roll out to the Gemini app later this year.
Launches OpenAI Sora Alternative, Calls it Veo
Google also introduced Veo, a new text-to-video model that aims to compete with OpenAI’s Sora.
Veo builds upon techniques from prior video models to improve consistency, quality, and resolution to ptovide high-quality 1080p videos that exceed one minute in length, showcasing impeccable quality.
While some were impressed with Veo’s capabilities, others argue that it may not be state-of-the-art in terms of latency or ability compared to SORA.
Besides its obsession with cat playing Guitar, Google also unveiled Imagen 3, its most advanced text-to-image model yet. Imagen 3 generates stunningly photorealistic images with incredible detail and lighting.
“It understands prompts the way people write, creates more photorealistic images and is our best model for rendering text,” Google tweeted. The model excels at versatility, prompt understanding, and image quality thanks to richer training data captions.
Imagen 3 is available now in private preview through ImageFX and coming soon to Vertex AI.
Insane Compute With Trillium TPUs
To support these AI advancements, Google unveiled its 6th generation Tensor Processing Units called Trillium, delivering a 4.7x performance boost. Trillium will be available to Google Cloud customers in late 2024, alongside custom ARM-based CPUs and NVIDIA GPUs.
The company also highlighted its leadership in efficient liquid cooling for data centres, with a deployed capacity approaching 1 gigawatt. For context, India’s total data centre capacity is slated to reach 1 gigawatt this year.
Combined with an extensive global fiber network, Google’s infrastructure investments aim to advance AI innovation and deliver state-of-the-art capabilities.
“This progress is only possible because of our incredible developer community. You’re making it real through the experiences you build every day,” said Google CEO Sundar Pichai. With these announcements, Google aims to make AI helpful for everyone, pushing the boundaries of what’s possible with artificial intelligence.
From lightweight models like Gemini 1.5 Flash to ambitious projects like Astra, and from photorealistic image generation to multilingual inclusivity, Google I/O 2024 showcased the company’s commitment to advancing AI responsibly. By putting powerful tools in the hands of developers, enabling seamless integration into products, and investing in cutting-edge infrastructure, Google is poised to bring the benefits of AI to people worldwide.
As Pichai noted, “We see this as how we will make the most progress against our mission: organising the world’s information across every input, making it accessible via any output, and combining the world’s information with the information in your world in a way that’s truly useful for you.” The future of AI is unfolding rapidly, and Google is at the forefront, striving to make it helpful for everyone.
Gemma 2: Advancing Open-Source AI
Google announced Gemma 2, the next generation of its open-source AI models. Gemma 2 boasts a new 27B parameter model that “outperforms some models that are more than twice its size”. Optimised for NVIDIA GPUs and Google TPUs, it enables developers to customise and deploy state-of-the-art AI efficiently.
PaliGemma, Google’s first vision-language model, also debuted for image captioning and visual question-answering tasks. These models incorporate comprehensive safety measures and transparent evaluations to foster responsible AI development.
Gemma’s unique tokenisation capabilities have also enabled Navarasa—a model fine-tuned for 15 Indic languages.
“Our biggest dream is to build a model to include everyone from all corners of India,” said the Navarasa team.
Navarasa enables people to converse in their native Indian languages and receive responses in kind. This aligns with Google’s mission to organise the world’s information and make it universally accessible, and making AI helpful for everyone.
Perplexity’s Existence Hangs in Balance
Across its products, Google is integrating AI-powered features to enhance user experiences. New AI overviews in Google Search provides instant answers to complex questions by gathering relevant information.
Multi-step reasoning allows Search to break down bigger questions, find the highest quality information, and synthesise it into a helpful overview.
In a statement that seemed to take a jab at Perplexity, Liz Reid, VP of Engineering for Search at Google, confidently asserted, “Thanks to our real-time info and ranking expertise, it reasons, using the highest quality information out there.”
Transforms work like never before: Gmail is getting AI-generated summaries, contextual smart replies, and the ability to extract data from attachments into spreadsheets seamlessly.
Gemini in Google Workspace will enable users to automate workflows like expense tracking and data analysis.
Looking ahead, Google is also exploring customisable virtual teammates, aka AI teammates that can be configured for specific roles and objectives—calls it Gems.
The post Google Makes AI Helpful for Everyone, Not Just for Him or Her appeared first on Analytics India Magazine.