Text Generation Web UI: Gradio interface for local LLMs
Gradio-based UI for running LLMs locally with multiple model format and extension support.
Learn more about Text Generation WebUI
text-generation-webui is a comprehensive web interface built with Gradio for running large language models locally. It supports multiple model formats including GGUF, GPTQ, AWQ, EXL2, and Transformers, enabling users to load and interact with various LLMs through an intuitive chat interface. The platform features multiple interface modes (chat, notebook, default), extension system for enhanced functionality, API endpoints compatible with OpenAI format, LoRA adapter support, and advanced sampling parameters. It provides model quantization, multi-GPU support, and streaming responses. The architecture allows seamless switching between models and offers extensive customization options including character cards, instruction templates, and prompt formatting for different use cases from creative writing to technical assistance.
Universal Model Support
Supports virtually all popular LLM formats and architectures including GGUF, GPTQ, AWQ, EXL2, and native Transformers models. Seamlessly load models from Hugging Face, use multiple quantization methods, and switch between different models without configuration changes. Compatible with Llama, Mistral, GPT-J, MPT, and countless other architectures through unified interface.
Extensive Extension Ecosystem
Built-in extension system enables powerful functionality additions including multimodal support, API servers, web search integration, long-term memory, and custom samplers. Community-developed extensions expand capabilities for specialized tasks like character AI, voice synthesis, image generation integration, and advanced prompt engineering without modifying core code.
Production-Ready API Server
Includes OpenAI-compatible API endpoints enabling drop-in replacement for commercial LLM services. Supports streaming responses, chat completions, embeddings generation, and model switching via REST API. Perfect for integrating local LLMs into existing applications, development workflows, or building custom AI-powered tools while maintaining full data privacy and control.
import requests
response = requests.post('http://localhost:5000/api/v1/generate', json={
'prompt': 'Write a short story about a robot:',
'max_new_tokens': 200,
'temperature': 0.7
})
generated_text = response.json()['results'][0]['text']
print(generated_text)Improve chat UI styling, fix model loading issues, and update dependencies.
- –Improve the style of tables and horizontal separators in chat messages
- –Fix loading models which have their eos token disabled
- –Fix a symbolic link issue in llama-cpp-binaries while updating non-portable installs
- –Update llama.cpp and bitsandbytes to 0.49
Update llama.cpp with improved GPU support for NVIDIA, AMD, and Intel.
- –Update llama.cpp to latest version
- –NVIDIA GPU: Use cuda12.4 builds
- –AMD/Intel GPU: Use vulkan builds
- –CPU only: Use cpu builds
Reduce portable build sizes and update core dependencies.
- –Reduce the size of all Linux/macOS portable builds by excluding llama.cpp symlinks and recreating them on first launch
- –Update llama.cpp to latest version
- –Update ExLlamaV3 to 0.0.18, safetensors to 0.7, triton-windows to 3.5.1.post22
See how people are using Text Generation WebUI
Related Repositories
Discover similar tools and frameworks used by developers
YOLOv7
PyTorch single-stage detector with bag-of-freebies training optimizations.
LivePortrait
PyTorch implementation for animating portraits by transferring expressions from driving videos.
Claude Code
Local CLI integrating Claude AI into development workflows.
llama_index
Connect LLMs to external data via RAG workflows.
Real-ESRGAN
PyTorch framework for blind super-resolution using GANs.