Navigate:
All Reposwhisper.cpp
~$WHISPE0.2%

whisper.cpp: C/C++ port of OpenAI's Whisper model

Lightweight, cross-platform speech recognition engine delivering real-time transcription with minimal dependencies and optimized inference performance.

LIVE RANKINGS • 06:52 AM • STEADY
TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100
OVERALL
#53
35
AI & ML
#29
15
30 DAY RANKING TREND
ovr#53
·AI#29
STARS
45.6K
FORKS
5.1K
DOWNLOADS
45
7D STARS
+71
7D FORKS
+2
Tags:
See Repo:
Share:

Learn more about whisper.cpp

whisper.cpp is a high-performance C/C++ implementation of OpenAI's Whisper automatic speech recognition model. It provides efficient inference without requiring Python dependencies, utilizing optimized CPU and GPU backends including Metal, CUDA, and OpenCL. The project supports multiple quantization formats (4-bit to 16-bit) for reduced memory footprint, enables real-time streaming transcription, and offers bindings for various programming languages. It includes command-line tools, server implementations, and mobile platform support (iOS, Android). The architecture prioritizes minimal resource consumption while maintaining accuracy, making it ideal for edge devices, embedded systems, and production deployments requiring fast, offline speech-to-text capabilities.

whisper.cpp

1

Zero Python Dependencies

Unlike the original Whisper implementation, whisper.cpp runs entirely in C/C++ with no Python runtime required. This dramatically reduces deployment complexity, memory overhead, and startup latency. The self-contained binary approach enables seamless integration into embedded systems, mobile applications, and resource-constrained environments where Python installations are impractical or impossible.

2

Real-Time Performance Optimization

Delivers highly optimized inference through hardware-accelerated backends including Apple Metal, NVIDIA CUDA, and Arm NEON intrinsics. Model quantization support (4-bit, 5-bit, 8-bit) reduces memory usage by up to 75% while maintaining transcription quality. Achieves real-time factor (RTF) below 0.5 on modern CPUs, enabling live streaming transcription and interactive voice applications.

3

Cross-Platform Production Ready

Supports deployment across Linux, macOS, Windows, iOS, Android, and WebAssembly with consistent APIs. Includes production-ready HTTP server implementation, language bindings (Python, Node.js, Go, Rust, Swift), and integration examples. The lightweight footprint enables edge deployment in IoT devices, desktop applications, and cloud services with straightforward build and deployment processes.


from whispercpp import Whisper

model = Whisper.from_pretrained("base.en")
result = model.transcribe("audio.wav")

print(result["text"])

vv1.8.2

Patch release fixing a CPU scalar operator bug in the ggml norm implementation.

  • Upgrade to resolve incorrect behavior in ggml norm CPU scalar operations that may affect inference accuracy.
  • No breaking changes or new requirements; this is a drop-in bug-fix update from v1.8.1.
vv1.8.1

Patch release fixes Vulkan build failures, VAD memory leaks, and adds initial-prompt carry-over support.

  • Update builds using Vulkan backend to resolve compilation errors introduced in v1.8.0.
  • Pass `--carry-initial-prompt` to preserve context across segments; VAD leaks are now patched.
vv1.8.0

Flash attention now enabled by default, delivering 30–50% faster encoding across all models on Apple Silicon.

  • Expect encoder speedups of 30–50% on M1/M2 hardware with no configuration changes required.
  • Benchmark tables show medium model encoding drops from 580ms to 395ms on M1 Pro with flash attention.

See how people are using whisper.cpp

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers