whisper.cpp: C/C++ port of OpenAI's Whisper model
Lightweight C++ port of OpenAI Whisper for cross-platform speech recognition.
Learn more about whisper.cpp
whisper.cpp is a high-performance C/C++ implementation of OpenAI's Whisper automatic speech recognition model. It provides efficient inference without requiring Python dependencies, utilizing optimized CPU and GPU backends including Metal, CUDA, and OpenCL. The project supports multiple quantization formats (4-bit to 16-bit) for reduced memory footprint, enables real-time streaming transcription, and offers bindings for various programming languages. It includes command-line tools, server implementations, and mobile platform support (iOS, Android). The architecture prioritizes minimal resource consumption while maintaining accuracy, making it ideal for edge devices, embedded systems, and production deployments requiring fast, offline speech-to-text capabilities.
Zero Python Dependencies
Unlike the original Whisper implementation, whisper.cpp runs entirely in C/C++ with no Python runtime required. This dramatically reduces deployment complexity, memory overhead, and startup latency. The self-contained binary approach enables seamless integration into embedded systems, mobile applications, and resource-constrained environments where Python installations are impractical or impossible.
Real-Time Performance Optimization
Delivers highly optimized inference through hardware-accelerated backends including Apple Metal, NVIDIA CUDA, and Arm NEON intrinsics. Model quantization support (4-bit, 5-bit, 8-bit) reduces memory usage by up to 75% while maintaining transcription quality. Achieves real-time factor (RTF) below 0.5 on modern CPUs, enabling live streaming transcription and interactive voice applications.
Cross-Platform Production Ready
Supports deployment across Linux, macOS, Windows, iOS, Android, and WebAssembly with consistent APIs. Includes production-ready HTTP server implementation, language bindings (Python, Node.js, Go, Rust, Swift), and integration examples. The lightweight footprint enables edge deployment in IoT devices, desktop applications, and cloud services with straightforward build and deployment processes.
from whispercpp import Whisper
model = Whisper.from_pretrained("base.en")
result = model.transcribe("audio.wav")
print(result["text"])Maintenance release with latest ggml, minor improvements in tools/server/bindings.
- –server: set no_context == true
- –Fix WASM example broken for Hebrew
- –ruby: tiny bug fix and VAD separately from ASR
- –12x Performance Boost with Integrated Graphics
Fix a bug in the ggml norm CPU scalar operator.
- –Fix a bug in the ggml norm CPU scalar operator
- –sync: ggml
Fix Vulkan builds and memory leaks, add new features.
- –Fix Vulkan builds
- –Fix memory leaks when using VAD
- –Support --carry-initial-prompt
- –bindings-java: disable flash attention by default
See how people are using whisper.cpp
Related Repositories
Discover similar tools and frameworks used by developers
Transformers
Unified API for pre-trained transformer models across frameworks.
Chat SDK
Open-source Next.js template for AI chatbots with auth, persistence, and multi-provider support.
ONNX Runtime
Cross-platform engine for optimized ONNX model execution.
Ray
Unified framework for scaling AI and Python applications from laptops to clusters with distributed runtime.
Pi Mono
Monorepo providing AI agent development tools, unified LLM API, and deployment management for multiple providers.