Navigate:

All ReposSGLang

~$SGLAN↑1.0%

SGLang: Serving framework for large language models

High-performance inference engine for LLMs and VLMs.

LIVE RANKINGS • 02:16 PM • STEADY

TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50

OVERALL

#50

AI & ML

#28

30 DAY RANKING TREND

ovr#50

·AI#28

STARS

23.8K

FORKS

4.6K

7D STARS

+235

7D FORKS

+105

Tags:

AI & ML

See Repo:

Learn more about SGLang

SGLang is a serving framework written in Python and CUDA that handles inference for large language models and vision language models. The framework implements scheduling, batching, and memory management optimizations to improve throughput and latency during model serving. It supports multiple hardware backends including NVIDIA GPUs, AMD GPUs, and TPUs through different implementations like the SGLang-Jax backend. Common deployment scenarios include running open-source models like Llama, Qwen, and DeepSeek, as well as proprietary models through API compatibility layers.

Multi-Backend Hardware Support

Runs natively on NVIDIA GPUs, AMD GPUs, and TPUs through specialized backends. SGLang-Jax enables TPU execution while CUDA and ROCm implementations target GPU hardware without code changes.

Cache-Aware Batch Scheduling

Zero-overhead scheduler and cache-aware load balancer optimize memory usage across concurrent requests. Reduces scheduling latency and maximizes throughput compared to naive batching approaches.

Day-One Model Support

Provides immediate integration with newly released language models through a flexible architecture that adapts to evolving model APIs. New model releases can be deployed within hours rather than weeks, ensuring users always have access to the latest AI capabilities without waiting for framework updates.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="EMPTY"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "What is AI?"}],
    temperature=0.7
)

print(response.choices[0].message.content)

vv0.5.8

Major performance improvements with 1.5x faster diffusion models and linear scaling for million-token contexts

–Up to 1.5x faster across the board for all major diffusion models
–Close to linear scaling with chunked pipeline parallelism for super long million-token context
–Optimizing GLM4-MoE for Production: 65% Faster TTFT
–EPD Disaggregation: Elastic Encoder Scaling for Vision-Language Models
–Day 0 Support for GLM 4.7 Flash

vgateway-v0.3.1

Massive performance improvements with 10-12x faster cache-aware routing using radix trees

–Cache-aware routing can now handle over 216,000 cache insertions per second (up from 18,900)
–Prefix matching across 10,000 tree entries jumped from 41,000 to 124,000 operations per second
–Under concurrent load with 64 threads, the system processes 474,000 operations per second
–INSERT operations now process 440 MB/s (up from 38 MB/s)
–MATCH operations handle 253 MB/s (up from 83 MB/s)

vv0.5.7

Day 0 support for multiple new models including Mimo-V2-Flash, Nemotron-Nano-v3, and LLaDA 2.0

–Day 0 Support for Mimo-V2-Flash
–Day 0 Support for Nemotron-Nano-v3
–Day 0 Support for LLaDA 2.0
–SGLang-Diffusion Day 0 Support for Qwen-Image-Edit-2509, Qwen-Image-Edit-2511, Qwen-Image-2512 and Qwen-Image-Layered

See how people are using SGLang

Loading tweets...

Top in AI & ML

Trending Repos

Pi Mono

17,222#1

OpenClaw

233,443#2

Zvec

8,089#3

Claude Code

70,649#4

Heretic

9,761#5

See all →

LIVE RANKINGS • 02:16 PM • STEADY

TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50

OVERALL

#50

AI & ML

#28

30 DAY RANKING TREND

ovr#50

·AI#28

STARS

23.8K

FORKS

4.6K

7D STARS

+235

7D FORKS

+105

[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers

SGLang: Serving framework for large language models

Learn more about SGLang

What is SGLang for?

What makes SGLang different?

Multi-Backend Hardware Support

Cache-Aware Batch Scheduling

Day-One Model Support

Example code snippets

From YouTube

Recent Changes

See how people are using SGLang

Top in AI & ML

Pi Mono

OpenClaw

Claude Code

Heretic

Rowboat

Trending Repos

Pi Mono

OpenClaw

Zvec

Claude Code

Heretic

Related Repositories

xFormers

Ultralytics YOLO

pix2pix

YOLOv5

Stable Diffusion

Product

Company

Helpful Links