Alibaba, on Wednesday, added voice and video chat capabilities to Qwen Chat, apart from releasing its model new open-source mannequin, Qwen2.5-Omni-7B, which made this potential. It was launched as an open-source mannequin underneath Apache 2.0 licence.
The corporate highlighted in a weblog publish that Qwen2.5-Omni is the brand new flagship end-to-end multimodal mannequin within the Qwen collection. It said that it’s designed for multimodal notion and seamlessly processes textual content, pictures, audio, and video, delivering real-time streaming responses by way of textual content and speech synthesis.
The important thing options of the mannequin embody a ‘Thinker-Talker’ structure, which permits it to offer real-time responses. The Thinker a part of the structure is a Transformer decoder, which acts just like the mind and the Talker, designed as a dual-track autoregressive Transformer decoder, operates just like the human mouth.
Alibaba’s Qwen2.5-Omni mannequin has proven robust efficiency throughout varied duties, together with speech recognition, translation, audio and video understanding, and speech era, outperforming comparable fashions at duties that require a number of modalities.
It was in comparison with comparable single-modality and closed-source fashions like Qwen2.5-VL-7B, Qwen2-Audio, and Gemini-1.5-pro, reaching state-of-the-art efficiency.
Voice Chat + Video Chat! Simply in Qwen Chat (https://t.co/FmQ0B9tiE7)! Now you can chat with Qwen similar to making a cellphone name or making a video name! Examine the demo in https://t.co/42iDe4j1Hs
What's extra, we opensource the mannequin behind all this, Qwen2.5-Omni-7B, underneath the… pic.twitter.com/LHQOQrl9Ha— Qwen (@Alibaba_Qwen) March 26, 2025
The paper and code for the brand new mannequin will be discovered on GitHub, whereas the AI mannequin is obtainable on Hugging Face together with a demo.
Final month, Alibaba additionally launched QwQ-Max-Preview, a brand new AI reasoning mannequin throughout the Qwen household that specialises in arithmetic and coding duties and encompasses a “considering” functionality within the Qwen Chat software.
The mannequin, which outperformed OpenAI’s fashions on the LiveCodeBench leaderboard, is anticipated to have smaller variants open-sourced for native machine deployment, in addition to a devoted cellular app.
There could also be much more coming, contemplating Alibaba’s dedication to investing over $52 billion in AI over the following three years.
The publish Alibaba Releases Qwen2.5 Omni, Provides Voice and Video Modes to Qwen Chat appeared first on Analytics India Journal.