Navigate:
Text Generation WebUI
~$TGW0.1%

Text Generation Web UI: Gradio interface for local LLMs

Gradio-based UI for running LLMs locally with multiple model format and extension support.

LIVE RANKINGS • 12:30 PM • STEADY
OVERALL
#289
46
AI & ML
#89
3
30 DAY RANKING TREND
ovr#289
·AI#89
STARS
46.1K
FORKS
5.9K
7D STARS
+55
7D FORKS
+6
Tags:
See Repo:
Share:

Learn more about Text Generation WebUI

text-generation-webui is a comprehensive web interface built with Gradio for running large language models locally. It supports multiple model formats including GGUF, GPTQ, AWQ, EXL2, and Transformers, enabling users to load and interact with various LLMs through an intuitive chat interface. The platform features multiple interface modes (chat, notebook, default), extension system for enhanced functionality, API endpoints compatible with OpenAI format, LoRA adapter support, and advanced sampling parameters. It provides model quantization, multi-GPU support, and streaming responses. The architecture allows seamless switching between models and offers extensive customization options including character cards, instruction templates, and prompt formatting for different use cases from creative writing to technical assistance.

Text Generation WebUI

1

Universal Model Support

Supports virtually all popular LLM formats and architectures including GGUF, GPTQ, AWQ, EXL2, and native Transformers models. Seamlessly load models from Hugging Face, use multiple quantization methods, and switch between different models without configuration changes. Compatible with Llama, Mistral, GPT-J, MPT, and countless other architectures through unified interface.

2

Extensive Extension Ecosystem

Built-in extension system enables powerful functionality additions including multimodal support, API servers, web search integration, long-term memory, and custom samplers. Community-developed extensions expand capabilities for specialized tasks like character AI, voice synthesis, image generation integration, and advanced prompt engineering without modifying core code.

3

Production-Ready API Server

Includes OpenAI-compatible API endpoints enabling drop-in replacement for commercial LLM services. Supports streaming responses, chat completions, embeddings generation, and model switching via REST API. Perfect for integrating local LLMs into existing applications, development workflows, or building custom AI-powered tools while maintaining full data privacy and control.


import requests

response = requests.post('http://localhost:5000/api/v1/generate', json={
    'prompt': 'Write a short story about a robot:',
    'max_new_tokens': 200,
    'temperature': 0.7
})

generated_text = response.json()['results'][0]['text']
print(generated_text)

vv3.23

Improve chat UI styling, fix model loading issues, and update dependencies.

  • Improve the style of tables and horizontal separators in chat messages
  • Fix loading models which have their eos token disabled
  • Fix a symbolic link issue in llama-cpp-binaries while updating non-portable installs
  • Update llama.cpp and bitsandbytes to 0.49
vv3.22

Update llama.cpp with improved GPU support for NVIDIA, AMD, and Intel.

  • Update llama.cpp to latest version
  • NVIDIA GPU: Use cuda12.4 builds
  • AMD/Intel GPU: Use vulkan builds
  • CPU only: Use cpu builds
vv3.21

Reduce portable build sizes and update core dependencies.

  • Reduce the size of all Linux/macOS portable builds by excluding llama.cpp symlinks and recreating them on first launch
  • Update llama.cpp to latest version
  • Update ExLlamaV3 to 0.0.18, safetensors to 0.7, triton-windows to 3.5.1.post22

See how people are using Text Generation WebUI

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers