Alibaba Introduces Open-Source Model For Digital Human Video Generation

Model supports both social media formats and traditional film production.

Subscribe to our Telegram channel for the latest stories and updates.

Alibaba has unveiled Wan2.2-S2V (Speech-to-Video), its latest open-source model designed for digital human video creation. This innovative tool converts portrait photos into film-quality avatars capable of speaking, singing, and performing.

Part of Alibaba’s Wan2.2 video generation series, the new model can produce high-quality animated videos from a single image and an audio clip.

Versatile Animation Capabilities

Wan2.2-S2V offers flexible character animation, enabling video creation across multiple framing options including portrait, bust, and full-body perspectives. It can dynamically generate character actions and environmental elements based on prompts, giving professional creators precise control for storytelling and design needs.

Powered by advanced audio-driven animation technology, the model delivers lifelike performances ranging from natural dialogue to musical sequences. It also supports multi-character scenes and diverse avatar styles, including cartoons, animals, and stylised figures.

Flexible Output For Creators

To meet varied professional needs, the technology provides output resolutions of 480P and 720P. This ensures visuals suitable for both social media content and professional presentations.

Innovative Technologies

Wan2.2-S2V goes beyond traditional talking-head animations by combining text-guided global motion control with audio-driven local movements, resulting in expressive performances even in complex scenarios.

A key breakthrough is its frame processing technique, which compresses historical frames into a single latent representation. This reduces computational load and enables stable long-video generation – a major step in extended animation production.

The model’s strength is further enhanced by its large-scale audio-visual training dataset, tailored for film and television scenarios. Using multi-resolution training, Wan2.2-S2V supports flexible video outputs across short-form vertical formats and traditional widescreen productions.

Expanding The Open-Source Ecosystem

Wan2.2-S2V is available for download on Hugging Face, GitHub, and Alibaba Cloud’s open-source community, ModelScope. Alibaba previously open-sourced Wan2.1 models in February 2025 and Wan2.2 models in July. To date, the Wan series has recorded over 6.9 million downloads on Hugging Face and ModelScope.

Share your thoughts with us via TechTRP's Facebook, Twitter and Telegram channel for the latest stories and updates.

Previous Post

Bolt Malaysia Releases Nationwide Survey On Drivers’ Views Amid Gig Worker Bill Debate

Next Post

Qualcomm Launches World’s First Enterprise Mobile Processor With Fully Integrated RFID Capabilities

Related Posts
Total
0
Share