Text Generation Web UI: Gradio interface for local LLMs
Gradio-based UI for running LLMs locally with multiple model format and extension support.
Learn more about Text Generation WebUI
text-generation-webui is a comprehensive web interface built with Gradio for running large language models locally. It supports multiple model formats including GGUF, GPTQ, AWQ, EXL2, and Transformers, enabling users to load and interact with various LLMs through an intuitive chat interface. The platform features multiple interface modes (chat, notebook, default), extension system for enhanced functionality, API endpoints compatible with OpenAI format, LoRA adapter support, and advanced sampling parameters. It provides model quantization, multi-GPU support, and streaming responses. The architecture allows seamless switching between models and offers extensive customization options including character cards, instruction templates, and prompt formatting for different use cases from creative writing to technical assistance.
Universal Model Support
Supports virtually all popular LLM formats and architectures including GGUF, GPTQ, AWQ, EXL2, and native Transformers models. Seamlessly load models from Hugging Face, use multiple quantization methods, and switch between different models without configuration changes. Compatible with Llama, Mistral, GPT-J, MPT, and countless other architectures through unified interface.
Extensive Extension Ecosystem
Built-in extension system enables powerful functionality additions including multimodal support, API servers, web search integration, long-term memory, and custom samplers. Community-developed extensions expand capabilities for specialized tasks like character AI, voice synthesis, image generation integration, and advanced prompt engineering without modifying core code.
Production-Ready API Server
Includes OpenAI-compatible API endpoints enabling drop-in replacement for commercial LLM services. Supports streaming responses, chat completions, embeddings generation, and model switching via REST API. Perfect for integrating local LLMs into existing applications, development workflows, or building custom AI-powered tools while maintaining full data privacy and control.
import requests
response = requests.post('http://localhost:5000/api/v1/generate', json={
'prompt': 'Write a short story about a robot:',
'max_new_tokens': 200,
'temperature': 0.7
})
generated_text = response.json()['results'][0]['text']
print(generated_text)Improve chat UI styling, fix model loading issues, and update dependencies.
- –Improve the style of tables and horizontal separators in chat messages
- –Fix loading models which have their eos token disabled
- –Fix a symbolic link issue in llama-cpp-binaries while updating non-portable installs
- –Update llama.cpp and bitsandbytes to 0.49
Update llama.cpp with improved GPU support for NVIDIA, AMD, and Intel.
- –Update llama.cpp to latest version
- –NVIDIA GPU: Use cuda12.4 builds
- –AMD/Intel GPU: Use vulkan builds
- –CPU only: Use cpu builds
Reduce portable build sizes and update core dependencies.
- –Reduce the size of all Linux/macOS portable builds by excluding llama.cpp symlinks and recreating them on first launch
- –Update llama.cpp to latest version
- –Update ExLlamaV3 to 0.0.18, safetensors to 0.7, triton-windows to 3.5.1.post22
See how people are using Text Generation WebUI
Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
Goose
LLM-powered agent automating local software engineering workflows.
MMDetection
Modular PyTorch framework for object detection research and deployment.
PaddleOCR
Multilingual OCR toolkit with document structure extraction.
Continue
Multi-LLM coding agent with interactive and automated modes.
DeepFace
Python library wrapping multiple face recognition deep learning models.