Google Veo 3 is a formidable video era mannequin lately unveiled by Google, sparking widespread pleasure on the web. Its capabilities have left many surprised, with some even calling it scary good. The mannequin options audio synthesis and cinematic instruments, setting a brand new benchmark in AI-powered video era.
Whereas the tech world celebrated Google’s Veo 3 launch, ByteDance quietly launched one thing that could be even higher. TikTok’s father or mother firm lately printed the analysis paper for Seedance 1.0, a bilingual video era mannequin that now tops impartial leaderboards for each text-to-video and image-to-video era.
ByteDance didn’t launch with an occasion or demo. As an alternative, its technical benchmarks received the corporate into the highlight with none severe advertising efforts. The mannequin is constructed to help high-resolution, multi-shot era whereas sustaining quick inference and tight instruction adherence.
How Seedance 1.0 Crushed Veo 3

The corporate launched the expertise within the analysis paper, stating, “We decouple spatial and temporal layers with an interleaved multimodal positional encoding. This permits our mannequin to collectively study each text-to-video and image-to-video in a single mannequin, and natively help multi-shot video era.”
This method allows the AI mannequin to help complicated scene transitions and multi-shot storytelling with constant topic illustration.
A big a part of the mannequin’s efficiency stems from ByteDance’s information pipeline. The group curated a large-scale, multi-source dataset with detailed bilingual captions and dense annotation of movement and static options. Caption accuracy was prioritised to enhance immediate adherence throughout era. This was paired with a novel reinforcement studying setup utilizing three reward fashions centered on foundational alignment, movement high quality, and aesthetics.
In analysis, Seedance 1.0 outperformed Veo 3 throughout a number of dimensions. On the SeedVideoBench benchmark, designed in collaboration with movie administrators, the mannequin demonstrated greater scores in prompt-following and movement realism.

Notably, in image-to-video duties, Seedance retained extra visible consistency from the enter body, whereas Veo 3 confirmed occasional modifications in lighting and texture, the analysis paper claimed.

Inference efficiency is one other notable facet. When it comes to pace, Seedance 1.0 leaves the remaining behind. The corporate claims that it generates a five-second video at 1080p in simply 41.4 seconds on a single NVIDIA-L20, an inference time that’s an order of magnitude sooner than rivals like Sora, Runway Gen-4 and, after all, Veo 3.
ByteDance additionally talked about that it slashed prices and latency in a means that would push video era in direction of real-time use instances.
Furthermore, the AI mannequin managed to prime the leaderboard chart on Synthetic Evaluation, for each text-to-video and image-to-video era duties.

Reevaluating Veo 3 for Comparability
Veo 3 stays a technically formidable system. It launched audio-aware video synthesis and supplied customers with management over digital camera motion and shot composition by way of its Circulation instrument. Early consumer reactions highlighted the novelty of its synchronised dialogue and dynamic environments, inserting it on the forefront of audio-visual era.
Nonetheless, in direct comparisons, Veo 3 appears to fall quick in visible alignment and body consistency. The Seedance 1.0 analysis paper famous that Veo’s image-to-video outcomes typically altered topic look or scene lighting, impacting its general effectiveness. Whereas Veo succeeded in increasing the modality of generative video, its efficiency in conventional benchmarks lagged behind.
In distinction, Seedance 1.0 focuses on visible coherence and movement plausibility, with structured reinforcement studying and curated fine-tuning information enjoying key roles. Its strengths lie in reliability and controllability, particularly for multi-shot or long-duration sequences, eventualities vital for skilled or semi-automated content material creation.
Scheduled for a June 2025 integration throughout platforms like Doubao and Jimeng, Seedance 1.0 is poised to turn out to be a key productiveness instrument. Its goal is to enhance skilled workflows and common artistic duties considerably.
Whereas Veo 3 gained consideration for being the primary to mix reasonable video with ambient sound and dialogue, Seedance 1.0 achieved higher visible constancy, movement stability, and narrative coherence, however lacks audio capabilities.
The submit ByteDance’s AI Video Mannequin Quietly Beat Google’s Veo 3 at Its Personal Sport appeared first on Analytics India Journal.