Unsloth: Fine-tuning and reinforcement learning for LLMs
Memory-efficient Python library for accelerated LLM training.
Learn more about unsloth
Unsloth is a Python library that accelerates the fine-tuning and reinforcement learning processes for large language models while reducing memory consumption. It achieves performance improvements through optimized implementations of attention mechanisms, gradient computations, and memory management techniques that minimize overhead during training operations. The library provides pre-configured model loaders that automatically apply these optimizations to popular transformer architectures, enabling faster training speeds compared to standard implementations. It integrates with existing machine learning frameworks and supports common fine-tuning workflows including supervised learning and reinforcement learning from human feedback.

Memory optimization
Implements custom kernels and algorithmic improvements to reduce VRAM usage during training. The framework achieves reported reductions in memory consumption across different model sizes and training approaches.
Multi-model support
Provides optimized implementations for diverse model architectures including GPT-OSS, DeepSeek-R1, Qwen3, Gemma 3, Llama 3.1, and text-to-speech models. Supports both language models and vision-language models.
Multiple training paradigms
Supports supervised fine-tuning, reinforcement learning methods like GRPO, and specialized training for vision and text-to-speech tasks. Includes export functionality to multiple model formats and deployment platforms.
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-2-7b-bnb-4bit",
max_seq_length=2048,
dtype=None,
load_in_4bit=True
)
# Model is now ready with optimized memory and speedAdds Python 3.13 and PyTorch 2.9 support, Quantization-Aware Training, Docker image, and Blackwell/DGX Spark compatibility; fixes GRPO hangs and evaluation bugs.
- –Upgrade via `pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo` to fix GRPO training hangs and evaluation issues.
- –Enable Quantization-Aware Training to recover up to 70% accuracy; use new TorchAO export with `save_pretrained_torchao()` for Int4 weights.
Adds gpt-oss RL support with 3× faster inference and 50% lower VRAM; avoid Flash Attention 3 for gpt-oss to prevent training loss corruption.
- –Do not use Flash Attention 3 with gpt-oss models as it causes incorrect training loss.
- –Install latest transformers to fine-tune new Qwen3 models (Qwen3-VL, Qwen3-Omni, Qwen3-Next); verify VRAM capacity.
Adds vision/multimodal RL support (Gemma 3, Qwen2.5-VL) with 1.5–2× speed, 90% less VRAM, and 10× longer context via custom kernels; introduces GSPO algorithm and 'Standby' mode for single-GPU RL.
- –Enable vision RL by upgrading to this release; use provided notebooks for Qwen2.5-VL GSPO or Gemma 3 Vision GRPO workflows.
- –Adopt 'Standby' mode to run RL training and inference on one GPU without splitting, reducing VRAM by 50% and supporting 10× context.
See how people are using unsloth
Related Repositories
Discover similar tools and frameworks used by developers
video2x
Video2X enhances video quality using machine learning algorithms for upscaling, frame interpolation, and restoration with multiple backend support.
CodeFormer
Transformer-based face restoration using vector-quantized codebook lookup.
ComfyUI-Manager
Graphical package manager for ComfyUI custom nodes.
stable-diffusion-webui
Feature-rich web UI for Stable Diffusion that enables AI image generation, editing, and enhancement through an intuitive browser interface.
crawl4ai
Async browser automation extracting web content for LLMs.