Navigate:
Mochi 1
~$MOCH0.4%

Mochi 1: Open-source video generation model

10B parameter diffusion model for text-to-video generation using Asymmetric Diffusion Transformer.

LIVE RANKINGS • 10:20 AM • STEADY
OVERALL
#278
40
AI & ML
#85
30 DAY RANKING TREND
ovr#278
·AI#85
STARS
3.6K
FORKS
473
7D STARS
+13
7D FORKS
+5
Tags:
See Repo:
Share:

Learn more about Mochi 1

Mochi 1 is an open-source video generation model that creates videos from text descriptions using diffusion-based machine learning. The system uses a novel Asymmetric Diffusion Transformer (AsymmDiT) architecture with 10 billion parameters, processing text and visual tokens through separate streams with different capacities. It includes an AsymmVAE component that compresses videos to 128x smaller size with 8x8 spatial and 6x temporal compression. The model generates 480p videos and supports LoRA fine-tuning for customization on specific datasets.

Mochi 1

1

AsymmDiT Architecture

Uses asymmetric processing streams where the visual pathway has 4x more parameters than text processing. This design reduces memory requirements while focusing computational capacity on visual reasoning.

2

Efficient Video Compression

Includes AsymmVAE that compresses videos to 128x smaller size using asymmetric encoder-decoder structure. Achieves 8x8 spatial and 6x temporal compression to 12-channel latent space.

3

LoRA Fine-tuning Support

Provides built-in trainer for creating LoRA fine-tunes on custom video datasets. Can be fine-tuned on single H100 or A100 80GB GPU with safetensors format output.


from genmo.mochi_preview.pipelines import (
    DecoderModelFactory,
    DitModelFactory,
    MochiSingleGPUPipeline,
    T5ModelFactory,
    linear_quadratic_schedule,
)

pipeline = MochiSingleGPUPipeline(
    text_encoder_factory=T5ModelFactory(),
    dit_factory=DitModelFactory(
        model_path=f"weights/dit.safetensors", model_dtype="bf16"
    ),
    decoder_factory=DecoderModelFactory(
        model_path=f"weights/decoder.safetensors",
    ),
    cpu_offload=True,
    decode_type="tiled_spatial",
)

video = pipeline(
    height=480,
    width=848,
    num_frames=31,
    num_inference_steps=64,
    sigma_schedule=linear_quadratic_schedule(64, 0.025),
    cfg_schedule=[6.0] * 64,
    batch_cfg=False,
    prompt="your favorite prompt here ...",
    negative_prompt="",
    seed=12345,
)

See how people are using Mochi 1

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers