Wan2.1: Open-source video generation models
Diffusion transformer models for text and image-to-video generation.
Learn more about Wan2.1
Wan2.1 is a collection of diffusion-based video generation models developed for multiple video synthesis tasks. The architecture includes a custom video VAE component (Wan-VAE) for encoding and decoding video frames while preserving temporal information, paired with transformer-based diffusion models of varying scales. The smallest variant (T2V-1.3B) requires approximately 8GB of VRAM and can generate 480p video on consumer hardware, while larger variants support higher resolutions and more complex generation tasks. The models are integrated with standard frameworks like Hugging Face Diffusers and ComfyUI for inference.

Multi-task capability
Supports text-to-video, image-to-video, video editing, text-to-image, and video-to-audio generation within a single model family, rather than requiring separate specialized models for each task.
Consumer GPU compatibility
The 1.3B parameter variant operates within 8GB VRAM constraints, enabling deployment on standard consumer graphics cards without specialized hardware or quantization techniques.
Multilingual text generation
Includes capability to generate both Chinese and English text within video frames, addressing a gap in existing open-source video models at the time of release.
npm install wan2.1Related Repositories
Discover similar tools and frameworks used by developers
xformers
Memory-efficient PyTorch components for transformer architectures.
OpenHands
LLM agent framework automating development in sandboxed containers.
stable-diffusion-webui
Feature-rich web UI for Stable Diffusion that enables AI image generation, editing, and enhancement through an intuitive browser interface.
NAFNet
Efficient PyTorch architecture for image restoration tasks.
Wan2.2
Open-source diffusion framework for multi-modal video generation.