Whisper: General-purpose speech recognition model
Speech recognition system supporting multilingual transcription, translation, and language ID.
Learn more about Whisper
Whisper is a Transformer-based sequence-to-sequence model developed by OpenAI for automatic speech recognition and related tasks. The model uses a unified architecture that processes audio through log-Mel spectrograms and generates text tokens autoregressively, handling multiple speech processing tasks within a single framework. It comes in six different sizes ranging from 39M to 1.55B parameters, with both English-only and multilingual variants available. The system processes audio in 30-second windows and can perform transcription, translation to English, spoken language identification, and voice activity detection.
Multitask Architecture
Single model handles transcription, translation, language identification, and voice activity detection using special tokens as task specifiers. Replaces traditional multi-stage speech processing pipelines with unified sequence-to-sequence approach.
Weak Supervision Training
Trained on large-scale diverse audio data without requiring perfectly aligned transcripts. This approach enables robust performance across various audio conditions and speaking styles.
Multiple Model Sizes
Offers six model variants from tiny (39M parameters) to large (1.55B parameters) with different speed-accuracy tradeoffs. Includes specialized English-only models and an optimized turbo variant.
# Install Whisper
pip install -U openai-whisper
# Install required ffmpeg dependency
# Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
# macOS:
brew install ffmpeg
# Windows (Chocolatey):
choco install ffmpeg
# Transcribe audio files using the turbo model
whisper audio.flac audio.mp3 audio.wav --model turboSee how people are using Whisper
Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
llama.cpp
Quantized LLM inference with hardware-accelerated CPU/GPU backends.
FAISS
Efficient approximate nearest neighbor search for billion-scale vectors.
Pica
Unified API platform connecting AI agents to 150+ integrations with auth and tool building.
Ray
Unified framework for scaling AI and Python applications from laptops to clusters with distributed runtime.
Ollama
Go-based CLI for local LLM inference and management.