AI-generated video has been advancing quickly, with main tech builders racing to construct and commercialize their very own fashions. We're now seeing the rise of instruments that may generate strikingly photorealistic video from a single immediate in pure language. For essentially the most half, nevertheless, AI-generated video has had a obtrusive shortcoming: it's silent.
Now not. At its annual I/O developer convention on Tuesday, Google introduced the discharge of Veo 3, the most recent iteration of its video-generating AI mannequin, which additionally comes with the flexibility to generate synchronized audio.
Additionally: Everything announced at Google I/O 2025: Gemini, Search, Android XR, and more
Think about you immediate the system to generate a video set inside a busy subway automotive, for instance. Veo 3 can produce the video, together with AI-generated ambient background noise so as to add to the sense of realism. You possibly can even immediate it to generate audio of human voices, in line with Google.
The mannequin additionally reportedly focuses on simulating real-world physics and lip-syncing, making it a probably useful instrument for filmmakers and advancing Google's broader mission of bringing usable AI to artistic industries. It's obtainable now for Gemini Extremely subscribers within the US. It will also be accessed by Circulate, Google's new AI-powered filmmaking instrument, which was additionally unveiled at I/O this week.
A significant technical problem
Veo 3 represents one of many first fashions from a serious tech developer that may synchronize AI-generated video and audio. Meta's Film Gen, launched in October, is one other. Another instruments, like Runway's Gen-3 Alpha, include options that allow AI-generated audio to video in a post-production course of, however the concurrent technology of the 2 requires the compute and assets of a serious drive like Google.
Additionally: 8 best AI features and tools revealed at Google I/O 2025
Constructing AI fashions able to producing synchronized video and audio has been a thorny technical problem and an lively space of analysis throughout the AI trade. Each AI-generated video and AI-generated audio are distinct technical challenges, and fusing them introduces a complete new dimension of complexity. Right here's a demo of Veo 3.
For one factor, video is a sequence of nonetheless frames, whereas audio is a steady wave. Syncing the 2 due to this fact requires fashions that may function throughout these two modalities, accounting for the vastly totally different timescales by which they function.
Additionally: Google Flow is a new AI video generator meant for filmmakers – how to try it today
An AI mannequin fusing video with sound should additionally be capable of dynamically account for variables like materials, distance, and pace. A automotive driving at 100 miles per hour sounds loads totally different than one touring at 10 miles per hour; a horse strolling on cobblestones sounds totally different than one which's strolling on grass.
Get the morning's prime tales in your inbox every day with our Tech Today newsletter.