Alibaba has unveiled Wan2.2-S2V (Speech-to-Video), its latest open-source model designed for digital human video creation. This innovative tool converts portrait photos into film-quality avatars capable of speaking, singing, and performing.
Part of Alibaba’s Wan2.2 video generation series, the new model can produce high-quality animated videos from a single image and an audio clip.
Versatile Animation Capabilities
Wan2.2-S2V offers flexible character animation, enabling video creation across multiple framing options including portrait, bust, and full-body perspectives. It can dynamically generate character actions and environmental elements based on prompts, giving professional creators precise control for storytelling and design needs.
Powered by advanced audio-driven animation technology, the model delivers lifelike performances ranging from natural dialogue to musical sequences. It also supports multi-character scenes and diverse avatar styles, including cartoons, animals, and stylised figures.
Flexible Output For Creators
To meet varied professional needs, the technology provides output resolutions of 480P and 720P. This ensures visuals suitable for both social media content and professional presentations.
Innovative Technologies
Wan2.2-S2V goes beyond traditional talking-head animations by combining text-guided global motion control with audio-driven local movements, resulting in expressive performances even in complex scenarios.
A key breakthrough is its frame processing technique, which compresses historical frames into a single latent representation. This reduces computational load and enables stable long-video generation – a major step in extended animation production.
The model’s strength is further enhanced by its large-scale audio-visual training dataset, tailored for film and television scenarios. Using multi-resolution training, Wan2.2-S2V supports flexible video outputs across short-form vertical formats and traditional widescreen productions.
Expanding The Open-Source Ecosystem
Wan2.2-S2V is available for download on Hugging Face, GitHub, and Alibaba Cloud’s open-source community, ModelScope. Alibaba previously open-sourced Wan2.1 models in February 2025 and Wan2.2 models in July. To date, the Wan series has recorded over 6.9 million downloads on Hugging Face and ModelScope.