Stable Diffusion: Latent text-to-image diffusion models
Text-to-image diffusion in compressed latent space.
Learn more about stablediffusion
Stable Diffusion is a latent diffusion model that generates images from text prompts by operating in a compressed latent space rather than pixel space. The architecture uses a U-Net conditioned on text embeddings from CLIP or OpenCLIP encoders, with the diffusion process guided by text descriptions. Multiple model variants are available, including versions trained at 512x512 and 768x768 resolutions, with specialized fine-tuned versions for inpainting, upscaling, and depth-conditional synthesis. The models support both inference and fine-tuning workflows, with optional optimizations through libraries like xformers for GPU acceleration.
Latent space operation
The model performs diffusion in a compressed latent representation rather than pixel space, reducing computational requirements while maintaining image quality. This architectural choice enables efficient generation on consumer hardware.
Multiple specialized variants
Beyond base text-to-image models, the repository includes fine-tuned variants for specific tasks such as image inpainting, x4 upscaling, depth-guided synthesis, and image variation through unCLIP conditioning on CLIP embeddings.
Flexible text encoder options
Different model versions use different text encoders, including OpenCLIP-ViT/H in version 2.0 and later, with support for both standard noise-prediction and v-prediction formulations for different training objectives.
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
image = pipe("a serene mountain landscape at sunset").images[0]
image.save("output.png")Related Repositories
Discover similar tools and frameworks used by developers
gym
Standard API for reinforcement learning environment interfaces.
openvino
Convert and deploy deep learning models across Intel hardware.
ComfyUI
Visual graph-based diffusion model workflow builder.
TTS
PyTorch toolkit for deep learning text-to-speech synthesis.
PentestGPT
AI-assisted Python framework for automated security testing.