Whisper: General-purpose speech recognition model

Speech recognition system supporting multilingual transcription, translation, and language ID.

LIVE RANKINGS • 12:26 AM

TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100

OVERALL

#75

AI & ML

#43

30 DAY RANKING TREND

STARS

98.2K

FORKS

12.1K

7D STARS

+1.5K

7D FORKS

+148

Tags:

AI & ML

See Repo:

Learn more about Whisper

Whisper is a Transformer-based sequence-to-sequence model developed by OpenAI for automatic speech recognition and related tasks. The model uses a unified architecture that processes audio through log-Mel spectrograms and generates text tokens autoregressively, handling multiple speech processing tasks within a single framework. It comes in six different sizes ranging from 39M to 1.55B parameters, with both English-only and multilingual variants available. The system processes audio in 30-second windows and can perform transcription, translation to English, spoken language identification, and voice activity detection.

Multitask Architecture

Single model handles transcription, translation, language identification, and voice activity detection using special tokens as task specifiers. Replaces traditional multi-stage speech processing pipelines with unified sequence-to-sequence approach.

Weak Supervision Training

Trained on large-scale diverse audio data without requiring perfectly aligned transcripts. This approach enables robust performance across various audio conditions and speaking styles.

Multiple Model Sizes

Offers six model variants from tiny (39M parameters) to large (1.55B parameters) with different speed-accuracy tradeoffs. Includes specialized English-only models and an optimized turbo variant.

# Install Whisper
pip install -U openai-whisper

# Install required ffmpeg dependency
# Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
# macOS:
brew install ffmpeg
# Windows (Chocolatey):
choco install ffmpeg

# Transcribe audio files using the turbo model
whisper audio.flac audio.mp3 audio.wav --model turbo