Stable Diffusion: Latent text-to-image diffusion models
Text-to-image diffusion in compressed latent space.
Learn more about Stable Diffusion
Stable Diffusion is a latent diffusion model that generates images from text prompts by operating in a compressed latent space rather than pixel space. The architecture uses a U-Net conditioned on text embeddings from CLIP or OpenCLIP encoders, with the diffusion process guided by text descriptions. Multiple model variants are available, including versions trained at 512x512 and 768x768 resolutions, with specialized fine-tuned versions for inpainting, upscaling, and depth-conditional synthesis. The models support both inference and fine-tuning workflows, with optional optimizations through libraries like xformers for GPU acceleration.
Latent space operation
The model performs diffusion in a compressed latent representation rather than pixel space, reducing computational requirements while maintaining image quality. This architectural choice enables efficient generation on consumer hardware.
Multiple specialized variants
Beyond base text-to-image models, the repository includes fine-tuned variants for specific tasks such as image inpainting, x4 upscaling, depth-guided synthesis, and image variation through unCLIP conditioning on CLIP embeddings.
Flexible text encoder options
Different model versions use different text encoders, including OpenCLIP-ViT/H in version 2.0 and later, with support for both standard noise-prediction and v-prediction formulations for different training objectives.
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
image = pipe("a serene mountain landscape at sunset").images[0]
image.save("output.png")Related Repositories
Discover similar tools and frameworks used by developers
EasyOCR
PyTorch OCR library using CRAFT and CRNN models.
Unsloth
Memory-efficient Python library for accelerated LLM training.
Weights & Biases
ML experiment tracking platform with logging, visualization, and model versioning.
Stable Diffusion WebUI
Web UI for Stable Diffusion enabling AI image generation and editing in browser.
LLaMA-Factory
Parameter-efficient fine-tuning framework for 100+ LLMs.