ByteDance Releases Seedream 3.0 to Rival GPT-4o and Imagen 3

ByteDance, the corporate behind TikTok, has unveiled its newest picture era foundational mannequin, Seedream 3.0, claiming that it outperforms OpenAI’s GPT-4o in picture creation capabilities.

Seedream 3.0 is a bilingual (Chinese language-English) mannequin that tries to deal with limitations present in its predecessor, Seedream 2.0.

This comes proper after the ‘Ghiblification’ of photographs with the assistance of GPT-4o.

The mannequin utilises an expanded (roughly by 100%) dataset, leveraging a dynamic sampling mechanism. The pre-training section incorporates mixed-resolution coaching, cross-modality RoPE, illustration alignment loss, and resolution-aware timestep sampling for improved scalability and visible language alignment.

Submit-training optimisation utilises numerous aesthetic captions and a VLM-based reward mannequin to enhance the standard of the ultimate output.

The technical report mentions, “By using constant noise expectation and importance-aware timestep sampling, we obtain a 4 to eight instances speedup whereas sustaining picture high quality.”

With the mannequin, one can generate as much as 2K decision photographs, enabling it to ship high-quality outcomes.

Supply: Synthetic Evaluation Picture Area Leaderboard

The report states that it was in comparison with OpenAI GPT-4o, Imagen 3, Midjourney, amongst others. Though it initially topped the charts in accordance with their claims, it seems to be on par with GPT-4o and surpasses Imagen 3. That is evident when referencing the newest benchmarks from Synthetic Evaluation on the time of publication.

ByteDance highlights the distinct strengths of the mannequin. In dense textual content rendering, Seedream 3.0 excels in dealing with advanced Chinese language textual content era with superior typesetting and aesthetic composition, whereas GPT-4o, whereas sturdy with small English characters and LaTeX, reveals limitations with Chinese language fonts.

In picture enhancing duties, ByteDance’s SeedEdit, derived from Seedream, demonstrates higher ID preservation and immediate following in comparison with GPT-4o and Gemini-2.0, though it faces challenges with extra advanced enhancing eventualities.

ByteDance claims that photographs generated by GPT-4o are inclined to exhibit a darkish yellowish hue and vital noise, doubtlessly impacting their usability. On the identical time, Seedream fashions have constantly demonstrated sturdy efficiency by way of color, texture, readability, and general aesthetic enchantment.

The publish ByteDance Releases Seedream 3.0 to Rival GPT-4o and Imagen 3 appeared first on Analytics India Journal.