Mochi 1: Open-source video generation model
10B parameter diffusion model for text-to-video generation using Asymmetric Diffusion Transformer.
Learn more about Mochi 1
Mochi 1 is an open-source video generation model that creates videos from text descriptions using diffusion-based machine learning. The system uses a novel Asymmetric Diffusion Transformer (AsymmDiT) architecture with 10 billion parameters, processing text and visual tokens through separate streams with different capacities. It includes an AsymmVAE component that compresses videos to 128x smaller size with 8x8 spatial and 6x temporal compression. The model generates 480p videos and supports LoRA fine-tuning for customization on specific datasets.
AsymmDiT Architecture
Uses asymmetric processing streams where the visual pathway has 4x more parameters than text processing. This design reduces memory requirements while focusing computational capacity on visual reasoning.
Efficient Video Compression
Includes AsymmVAE that compresses videos to 128x smaller size using asymmetric encoder-decoder structure. Achieves 8x8 spatial and 6x temporal compression to 12-channel latent space.
LoRA Fine-tuning Support
Provides built-in trainer for creating LoRA fine-tunes on custom video datasets. Can be fine-tuned on single H100 or A100 80GB GPU with safetensors format output.
from genmo.mochi_preview.pipelines import (
DecoderModelFactory,
DitModelFactory,
MochiSingleGPUPipeline,
T5ModelFactory,
linear_quadratic_schedule,
)
pipeline = MochiSingleGPUPipeline(
text_encoder_factory=T5ModelFactory(),
dit_factory=DitModelFactory(
model_path=f"weights/dit.safetensors", model_dtype="bf16"
),
decoder_factory=DecoderModelFactory(
model_path=f"weights/decoder.safetensors",
),
cpu_offload=True,
decode_type="tiled_spatial",
)
video = pipeline(
height=480,
width=848,
num_frames=31,
num_inference_steps=64,
sigma_schedule=linear_quadratic_schedule(64, 0.025),
cfg_schedule=[6.0] * 64,
batch_cfg=False,
prompt="your favorite prompt here ...",
negative_prompt="",
seed=12345,
)See how people are using Mochi 1
Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
InvokeAI
Node-based workflow interface for local Stable Diffusion deployment.
YOLOv5
Real-time object detection with cross-platform deployment support.
Stanford Alpaca
Research project that fine-tunes LLaMA models to follow instructions using self-generated training data.
Paperless-ngx
Self-hosted OCR document archive with ML classification.
OpenAI Python
Type-safe Python client for OpenAI's REST API.