Google has had a busy year, with the tech giant starting to resemble a startup under the renewed leadership of Sergey Brin and Larry Page. Brin, who has re-engaged deeply with Google’s AI efforts, highlighted the thrill of being part of the AI revolution.
“I just don’t want to miss out on this… as a computer scientist, I’ve never seen anything as exciting as all of the AI progress that’s happened in the last few years,” he said.
Google is embracing a bolder approach to innovation, shipping products like there’s no tomorrow. Brin emphasised the importance of risk-taking in deploying AI tools despite imperfections: “You need to be willing to have some embarrassments and take some risks. This is something magical we’re giving to the world.”
By balancing experimentation and impactful AI applications, Google is redefining how it delivers value in areas ranging from coding to large-scale model integration, demonstrating a nimble and visionary approach reminiscent of its early days.
Echoing similar sentiments, Logan Kilpatrick, senior product manager at Google, told AIM that he feels great about coming back to his AI roots. He shared his experience about working at Google alongside Matt Velloso, Google’s vice president of product for AI/ML development, whom he admires the most, Jeff Dean, chief scientist at Google DeepMind and Google Research, and Demis Hassabis, CEO of Google DeepMind.
The ‘Gemini Era’ Begins
This year marked the dawn of the “Gemini era” for Google, with its newly launched Gemini AI models revolutionising generative AI. These multimodal models brought significant enhancements to text, image, and reasoning capabilities, elevating Google’s search and enterprise tools. The release of Gemini positioned 2024 as a milestone in Google’s ongoing AI evolution.
The Gemini series featured various iterations, including Gemini Ultra and Gemini 1.5 Pro, advanced LLMs designed for diverse applications across Google services. These models strengthened natural language processing capabilities and drove innovations in user interaction.
To further extend creative possibilities, Google introduced Veo, a new video generation model capable of producing high-definition videos based on natural language prompts. It enhances creative storytelling by accurately capturing a prompt’s tone and rendering intricate details, even in extended prompts.
The model can interpret cinematic terms such as “timelapse” and “aerial shots of a landscape,” seamlessly integrating them into its output.
Never bet against @Google!
They just dropped a Sora competitor. 1080p, over a minute long vids, and impeccable quality pic.twitter.com/XtOt7ftS46— andrew gao (@itsandrewgao) May 14, 2024
Google’s commitment to responsible AI practices was evident in the expansion of SynthID, its watermarking tool for AI-generated text and media. SynthID, which creates imperceptible digital watermarks, is now accessible across Google Cloud’s Vertex AI and supports Google’s Imagen and Veo models. This tool enables users to verify if the content is AI-generated using the ‘About this image’ feature in Search or Chrome.
To promote transparency, Google DeepMind collaborated with Hugging Face to open source its research on ‘Scalable watermarking for identifying large language model outputs’. This initiative aims to address questions of authenticity and traceability in the digital content space.
In addition, Google unveiled Imagen 3, an advanced text-to-image generation model that produces high-quality images and exhibits improved accuracy in rendering text within images.
However, concerns regarding deepfake technology continue to stir debate.
Tools like Grok 2, which allow the creation of uncensored deepfakes featuring public figures such as Kamala Harris and Donald Trump, have sparked significant ethical concerns.
In contrast, Google’s strict limitations with models like Imagen 3 reflect its cautious approach, as highlighted by Gmail creator Paul Buchheit. He remarked that while Google possesses the resources and early leadership in AI, it faces regulatory scrutiny and has been the target of substantial lawsuits.
“They had a version of Dolly called Image Gen, and it was prohibited from making human form,” said Paul Buchheit. He also mentioned that it was prohibited from making human form and how the company has been struggling to dominate the AI landscape, given they have all the necessary resources and early start in AI.
Besides, it also released a paper detailing the capabilities of its new model GameNGen. This research was authored by Dani Valevski (researcher at Google Research), Yaniv Leviathan (engineer at Google Research), Moab Arar (PhD candidate at Tel Aviv University), and Shlomi Fruchter (engineer at Google DeepMind).
Powered entirely by a neural model, it is one of the pioneer search engines that allows high-quality, real-time interactions with complex environments over long periods. “GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression,” said Google.
In August, Google concluded its Made By Google event with a new update on Project Astra, which also gained the limelight at Google I/O 2024. The company shared its plans of building advanced seeing and talking responsive agents.
ChatGPT Moment for Google
One of the releases that took the internet by storm, quickly becoming a hot favourite this year was NotebookLM. Many are even calling it Google’s own ChatGPT moment.
“It’s possible that NotebookLM podcast episode generation is touching on a whole new territory of highly compelling LLM product formats. Feels reminiscent of ChatGPT,” said the founder of Eureka Labs, Andrej Karpathy.
Raiza Martin, Google’s NotebookLM creator, spoke about how ChatGPT was huge for her and now the comparison feels a bit too much. Meanwhile, Google has been doubling down on adding more features and launched new updates to NotebookLM.
But the question remains: Will this be the end of podcasters?
Perhaps not yet. The product lead at NotebookLM believes that people are still using the tool for more personal reasons than a larger audience, and it would not be wise to restrict the definition to just a podcast.
“From what I have seen, a lot of the things that people are making with NotebookLM are not the same things that we would have a real podcast about,” said Martin, emphasising how she would still like to listen to her favourite podcaster, Lenny, instead of an AI-generated voice and their views on a particular subject.
Meanwhile, Google also made notable advancements in AI hardware by announcing the sixth-generation tensor processing unit known as Trillium. This new TPU offers a 4.7-fold increase in peak compute performance per chip compared to its predecessor. It is also 67% more energy efficient than the fifth-generation TPU, addressing latency and cost concerns.
Additionally, Google launched Learn About, a feature that redefined search by offering deeper, contextual learning opportunities directly within results, showcasing the company’s commitment to enhancing user interactions and information accessibility.
What’s Next?
Google I/O 2025, which is expected to take place in mid-May, is set to push AI boundaries further, building on the ‘Gemini era’ as Sundar Pichai, the CEO of Alphabet and Google, described at I/O 2024.
Expect Gemini 2.0, offering extended context, deeper multimodal capabilities, and personalised interactivity, delivering on Pichai’s vision of making AI “helpful for everyone.”
Early signs of AGI. Google DeepMind, most likely to share its AGI roadmap, emphasised by CEO Demis Hassabis, could materialise with Gemini-powered robotics under Project Astra, showcasing agents capable of reasoning, planning, and action.
“We’re still in the early days, but systems that reason across modalities are our next frontier,” said Hassabis.
Android, as outlined by Dave Burke, the VP of engineering, is also evolving with Gemini Nano, enabling on-device, privacy-first AI. “Gemini is unlocking new experiences that understand the world the way we do,” said Burke, hinting at advancements like AI-powered smart glasses.
Douglas Eck, at Google I/O 2024, remarked on Veo’s cinematic potential—“enabling cinematic quality at unprecedented levels”—suggesting further democratisation of high-quality media creation, likely through YouTube Studio and Photos integrations.
For developers, Gemma 2 models promised scalable AI, as highlighted by Josh Woodward: “Gemma’s quality-to-size ratio makes it ideal for developers pushing the boundaries.” Tools like Google AI Studio 2.0 are expected to simplify multimodal integrations with enhanced APIs.
Lastly, responsible AI took centre stage with SynthID advancements across text, audio, and video, at Google I/O 24. “We’re developing tools to safeguard against misinformation and ensure AI benefits everyone,” said James Manyika, the SVP of technology and society at Google.
I/O 2025 will likely redefine AI’s role across industries, aligning innovation with responsibility, as Pichai affirmed: “We’re just getting started.”
The post The Year Google Showed Everyone How AI is Really Done appeared first on Analytics India Magazine.