Mochi 1: Open-source video generation model
10B parameter diffusion model for text-to-video generation using Asymmetric Diffusion Transformer.
Learn more about Mochi 1
Mochi 1 is an open-source video generation model that creates videos from text descriptions using diffusion-based machine learning. The system uses a novel Asymmetric Diffusion Transformer (AsymmDiT) architecture with 10 billion parameters, processing text and visual tokens through separate streams with different capacities. It includes an AsymmVAE component that compresses videos to 128x smaller size with 8x8 spatial and 6x temporal compression. The model generates 480p videos and supports LoRA fine-tuning for customization on specific datasets.
AsymmDiT Architecture
Uses asymmetric processing streams where the visual pathway has 4x more parameters than text processing. This design reduces memory requirements while focusing computational capacity on visual reasoning.
Efficient Video Compression
Includes AsymmVAE that compresses videos to 128x smaller size using asymmetric encoder-decoder structure. Achieves 8x8 spatial and 6x temporal compression to 12-channel latent space.
LoRA Fine-tuning Support
Provides built-in trainer for creating LoRA fine-tunes on custom video datasets. Can be fine-tuned on single H100 or A100 80GB GPU with safetensors format output.
from genmo.mochi_preview.pipelines import (
DecoderModelFactory,
DitModelFactory,
MochiSingleGPUPipeline,
T5ModelFactory,
linear_quadratic_schedule,
)
pipeline = MochiSingleGPUPipeline(
text_encoder_factory=T5ModelFactory(),
dit_factory=DitModelFactory(
model_path=f"weights/dit.safetensors", model_dtype="bf16"
),
decoder_factory=DecoderModelFactory(
model_path=f"weights/decoder.safetensors",
),
cpu_offload=True,
decode_type="tiled_spatial",
)
video = pipeline(
height=480,
width=848,
num_frames=31,
num_inference_steps=64,
sigma_schedule=linear_quadratic_schedule(64, 0.025),
cfg_schedule=[6.0] * 64,
batch_cfg=False,
prompt="your favorite prompt here ...",
negative_prompt="",
seed=12345,
)See how people are using Mochi 1
Related Repositories
Discover similar tools and frameworks used by developers
Open WebUI
Extensible multi-LLM chat platform with RAG pipeline.
DeepFace
Python library wrapping multiple face recognition deep learning models.
Open Notebook
Open source implementation of Google's NotebookLM that runs locally with document processing and podcast generation.
Chat SDK
Open-source Next.js template for AI chatbots with auth, persistence, and multi-provider support.
LangChain
Modular framework for chaining LLMs with external data.