Navigate:
Whisper
~$WHISP0.5%

Whisper: General-purpose speech recognition model

Speech recognition system supporting multilingual transcription, translation, and language ID.

LIVE RANKINGS • 10:20 AM • STEADY
TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100
OVERALL
#84
5
AI & ML
#40
2
30 DAY RANKING TREND
ovr#84
·AI#40
STARS
95.2K
FORKS
11.8K
7D STARS
+441
7D FORKS
+36
Tags:
See Repo:
Share:

Learn more about Whisper

Whisper is a Transformer-based sequence-to-sequence model developed by OpenAI for automatic speech recognition and related tasks. The model uses a unified architecture that processes audio through log-Mel spectrograms and generates text tokens autoregressively, handling multiple speech processing tasks within a single framework. It comes in six different sizes ranging from 39M to 1.55B parameters, with both English-only and multilingual variants available. The system processes audio in 30-second windows and can perform transcription, translation to English, spoken language identification, and voice activity detection.

Whisper

1

Multitask Architecture

Single model handles transcription, translation, language identification, and voice activity detection using special tokens as task specifiers. Replaces traditional multi-stage speech processing pipelines with unified sequence-to-sequence approach.

2

Weak Supervision Training

Trained on large-scale diverse audio data without requiring perfectly aligned transcripts. This approach enables robust performance across various audio conditions and speaking styles.

3

Multiple Model Sizes

Offers six model variants from tiny (39M parameters) to large (1.55B parameters) with different speed-accuracy tradeoffs. Includes specialized English-only models and an optimized turbo variant.


# Install Whisper
pip install -U openai-whisper

# Install required ffmpeg dependency
# Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
# macOS:
brew install ffmpeg
# Windows (Chocolatey):
choco install ffmpeg

# Transcribe audio files using the turbo model
whisper audio.flac audio.mp3 audio.wav --model turbo


See how people are using Whisper

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers