Text Generation Web UI: Gradio interface for local LLMs
Feature-rich Gradio-based UI for running and interacting with LLMs locally, supporting multiple model formats and extensions.
Learn more about text-generation-webui
text-generation-webui is a comprehensive web interface built with Gradio for running large language models locally. It supports multiple model formats including GGUF, GPTQ, AWQ, EXL2, and Transformers, enabling users to load and interact with various LLMs through an intuitive chat interface. The platform features multiple interface modes (chat, notebook, default), extension system for enhanced functionality, API endpoints compatible with OpenAI format, LoRA adapter support, and advanced sampling parameters. It provides model quantization, multi-GPU support, and streaming responses. The architecture allows seamless switching between models and offers extensive customization options including character cards, instruction templates, and prompt formatting for different use cases from creative writing to technical assistance.
Universal Model Support
Supports virtually all popular LLM formats and architectures including GGUF, GPTQ, AWQ, EXL2, and native Transformers models. Seamlessly load models from Hugging Face, use multiple quantization methods, and switch between different models without configuration changes. Compatible with Llama, Mistral, GPT-J, MPT, and countless other architectures through unified interface.
Extensive Extension Ecosystem
Built-in extension system enables powerful functionality additions including multimodal support, API servers, web search integration, long-term memory, and custom samplers. Community-developed extensions expand capabilities for specialized tasks like character AI, voice synthesis, image generation integration, and advanced prompt engineering without modifying core code.
Production-Ready API Server
Includes OpenAI-compatible API endpoints enabling drop-in replacement for commercial LLM services. Supports streaming responses, chat completions, embeddings generation, and model switching via REST API. Perfect for integrating local LLMs into existing applications, development workflows, or building custom AI-powered tools while maintaining full data privacy and control.
import requests
response = requests.post('http://localhost:5000/api/v1/generate', json={
'prompt': 'Write a short story about a robot:',
'max_new_tokens': 200,
'temperature': 0.7
})
generated_text = response.json()['results'][0]['text']
print(generated_text)Adds Qwen3-VL support via llama.cpp update and hardens Training_PRO security; pins huggingface-hub to 0.36.0 to fix manual venv installs.
- –Pin huggingface-hub to 0.36.0 if using manual venv installs to avoid breakage.
- –Update to llama.cpp commit 230d1169 to enable Qwen3-VL model support and exllamav3 0.0.12.
Maintenance release with llama.cpp backend update (Ling-mini-2.0, Ring-mini-2.0 support), exllamav3 0.0.11, and macOS Tahoe Python requirement fix.
- –Update llama.cpp backend to commit d0660f237 for Ling-mini-2.0 and Ring-mini-2.0 model support.
- –Fix Python requirements on macOS Tahoe devices; portable builds now support symlink execution.
Security hardening locks --trust-remote-code to CLI-only; reverts exllamav3_hf gibberish bug from v3.14.
- –Set --trust-remote-code at launch; flag is now immutable from UI/API to prevent runtime code injection.
- –Update to llama.cpp f9fb33f and exllamav3 0.0.10; fixes chat spacing, metadata leaks, and HF download resumption.
Related Repositories
Discover similar tools and frameworks used by developers
YOLOX
PyTorch anchor-free object detector with scalable model variants.
ByteTrack
Multi-object tracker associating low-confidence detections across frames.
OpenHands
LLM agent framework automating development in sandboxed containers.
PentestGPT
AI-assisted Python framework for automated security testing.
LightRAG
Graph-based retrieval framework for structured RAG reasoning.