WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations.
Most models can combined with Loras Accelerators (check the Lora guide) to accelerate the generation of a video x2 or x3 with little quality loss
Please note that that the term Text2Video refers to the underlying Wan architecture but as it has been greatly improved overtime many derived Text2Video models can now generate videos using images.
- Size: 1.3 billion parameters
- VRAM: 6GB minimum
- Speed: Fast generation
- Quality: Good quality for the size
- Best for: Quick iterations, lower-end hardware
- Command:
python wgp.py --t2v-1-3B
- Size: 14 billion parameters
- VRAM: 12GB+ recommended
- Speed: Slower but higher quality
- Quality: Excellent detail and coherence
- Best for: Final production videos
- Command:
python wgp.py --t2v-14B
- Type: ControlNet for advanced video control
- VRAM: 6GB minimum
- Features: Motion transfer, object injection, inpainting
- Best for: Advanced video manipulation
- Command:
python wgp.py --vace-1.3B
- Type: Large ControlNet model
- VRAM: 12GB+ recommended
- Features: All Vace features with higher quality
- Best for: Professional video editing workflows
- Resolution: Claims 1080p capability
- VRAM: 20GB+ required
- Speed: Very slow generation
- Features: Should generate cinema like video, specialized for 2.1 / 1 ratios
- Status: Experimental, feedback welcome
- Size: 14 billion parameters
- VRAM: 12GB+ recommended
- Speed: Slower but higher quality
- Quality: Excellent detail and coherence
- Best for: Most Loras available work with this model
- Command:
python wgp.py --i2v-14B
- Type: Start/end frame specialist
- Resolution: Optimized for 720p
- Official: Wan team supported
- Use case: Image-to-video with specific endpoints
- Type: Multi Talking head animation
- Input: Voice track + image
- Works on: People
- Use case: Lip-sync and voice-driven animation for up to two people
- Type: Talking head animation
- Input: Voice track + image
- Works on: People and objects
- Use case: Lip-sync and voice-driven animation
- Type: Person/object transfer
- Resolution: Works well at 720p
- Requirements: 30+ steps for good results
- Best for: Transferring subjects between videos
- Type: Viewpoint change
- Requirements: 81+ frame input videos, 15+ denoising steps
- Use case: View same scene from different angles
- Type: Diffusion Forcing model
- Specialty: "Infinite length" videos
- Features: High quality continuous generation
- Size: 1.3 billion parameters
- VRAM: 6GB minimum
- Quality: Good for the size, accessible to lower hardware
- Best for: Entry-level image animation
- Command:
python wgp.py --i2v-1-3B
- Size: 14 billion parameters
- VRAM: 12GB+ recommended
- Quality: Better end image support
- Limitation: Existing loras don't work as well
- Quality: Among the best open source t2v models
- VRAM: 12GB+ recommended
- Speed: Slower generation but excellent results
- Features: Superior text adherence and video quality, up to 10s of video
- Best for: High-quality text-to-video generation
- Specialty: Identity preservation
- Use case: Injecting specific people into videos
- Quality: Excellent for character consistency
- Best for: Character-focused video generation
- Specialty: Generate up to 15s of high quality speech / song driven Video .
- Use case: Injecting specific people into videos
- Quality: Excellent for character consistency
- Best for: Character-focused video generation, Video synchronized with voice
- Specialty: Long video generation
- Resolution: Fast 720p generation
- VRAM: Optimized by WanGP (4x reduction in requirements)
- Best for: Longer duration videos
- Speed: Generate in less than one minute
- Quality: Very high quality despite speed
- Best for: Rapid prototyping and quick results
- Wan 2.1 T2V 1.3B
- Wan Fun InP 1.3B
- Wan Vace 1.3B
- Wan 2.1 T2V 14B
- Wan Fun InP 14B
- Hunyuan Video (with optimizations)
- LTX Video 13B
- All models supported
- Longer videos possible
- Higher resolutions
- Multiple simultaneous Loras
- MoviiGen (experimental 1080p)
- Very long videos
- Maximum quality settings
- LTX Video 13B Distilled - Fastest, high quality
- Wan 2.1 T2V 1.3B - Fast, good quality
- CausVid Lora - 4-12 steps, very fast
- Hunyuan Video - Overall best t2v quality
- Wan 2.1 T2V 14B - Excellent Wan quality
- Wan Vace 14B - Best for controlled generation
- Wan Vace 14B/1.3B - Motion transfer, object injection
- Phantom - Person/object transfer
- FantasySpeaking - Voice-driven animation
- LTX Video 13B - Specialized for length
- Sky Reels v2 - Infinite length videos
- Wan Vace + Sliding Windows - Up to 1 minute
- Wan Fun InP 1.3B - Image-to-video
- Wan 2.1 T2V 1.3B - Text-to-video
- Wan Vace 1.3B - Advanced control
- CausVid Lora (4-12 steps) - Fastest
- LTX Video Distilled - Very fast
- Wan 1.3B models - Fast
- Wan 14B models - Medium
- Hunyuan Video - Slower
- MoviiGen - Slowest
- Hunyuan Video - Highest overall
- Wan 14B models - Excellent
- LTX Video models - Very good
- Wan 1.3B models - Good
- CausVid - Good (varies with steps)
- Wan 1.3B models - Most efficient
- LTX Video (with WanGP optimizations)
- Wan 14B models
- Hunyuan Video
- MoviiGen - Least efficient
WanGP allows switching between models without restarting:
- Use the dropdown menu in the web interface
- Models are loaded on-demand
- Previous model is unloaded to save VRAM
- Settings are preserved when possible
Start with Wan 2.1 T2V 1.3B to learn the interface and test your hardware.
Use Hunyuan Video or Wan 14B models for final output quality.
CausVid Lora or LTX Distilled for rapid iteration and testing.
- VACE for advanced control
- FantasySpeaking for talking heads
- LTX Video for long sequences
Always start with the largest model your VRAM can handle, then optimize settings for speed vs quality based on your needs.