Stable Diffusion: Latent text-to-image diffusion models
Text-to-image diffusion in compressed latent space.
Learn more about stablediffusion
Stable Diffusion is a latent diffusion model that generates images from text prompts by operating in a compressed latent space rather than pixel space. The architecture uses a U-Net conditioned on text embeddings from CLIP or OpenCLIP encoders, with the diffusion process guided by text descriptions. Multiple model variants are available, including versions trained at 512x512 and 768x768 resolutions, with specialized fine-tuned versions for inpainting, upscaling, and depth-conditional synthesis. The models support both inference and fine-tuning workflows, with optional optimizations through libraries like xformers for GPU acceleration.
Latent space operation
The model performs diffusion in a compressed latent representation rather than pixel space, reducing computational requirements while maintaining image quality. This architectural choice enables efficient generation on consumer hardware.
Multiple specialized variants
Beyond base text-to-image models, the repository includes fine-tuned variants for specific tasks such as image inpainting, x4 upscaling, depth-guided synthesis, and image variation through unCLIP conditioning on CLIP embeddings.
Flexible text encoder options
Different model versions use different text encoders, including OpenCLIP-ViT/H in version 2.0 and later, with support for both standard noise-prediction and v-prediction formulations for different training objectives.
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
image = pipe("a serene mountain landscape at sunset").images[0]
image.save("output.png")Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
llama
PyTorch inference for Meta's Llama language models.
crewAI
Python framework for autonomous multi-agent AI collaboration.
open_clip
PyTorch library for contrastive language-image pretraining.
mlx
Lazy-evaluated NumPy-like arrays optimized for Apple silicon.
DeepSeek Coder
Code language models (1B-33B parameters) supporting completion and infilling across 80+ languages.