Fish Speech: Open source text-to-speech synthesis
Transformer-based TTS with voice cloning from reference audio.
Learn more about fish-speech
Fish Speech is an open-source text-to-speech synthesis system that generates natural speech audio from text input using transformer-based neural network architectures. The system implements voice cloning capabilities by analyzing reference audio samples to extract speaker characteristics, which are then applied during the synthesis process to reproduce the target voice. It processes text through multiple stages including linguistic analysis, acoustic feature prediction, and neural vocoding to produce waveform output. The architecture separates the text-to-acoustic-feature generation from the vocoding stage, allowing for modular optimization of each component in the speech synthesis pipeline.

Transformer-based architecture
Uses transformer models for semantic token prediction combined with VQVAE quantization, enabling efficient discrete representation of speech content.
Emotional speech control
Supports multiple emotional markers and tone specifications during synthesis, allowing fine-grained control over prosody and expression in generated speech.
Voice cloning from samples
Enables speaker adaptation through reference audio input, allowing synthesis in arbitrary speaker voices without requiring extensive speaker-specific training data.
from fish_speech import TextToSpeech
tts = TextToSpeech()
audio = tts.synthesize("Hello, this is a test of Fish Speech synthesis.")
audio.save("output.wav")Final stable release before next model version; no breaking changes or migration steps documented.
- –Pin to v1.5.1 if you need stability before the upcoming model architecture changes.
- –Release notes do not specify bug fixes, feature additions, or compatibility requirements.
Fish Speech 1.5 completes both inference and fine-tuning pipelines; release notes do not specify breaking changes or upgrade requirements.
- –Verify inference and fine-tuning workflows function as expected in your environment after upgrading to v1.5.0.
- –Consult repository documentation for API changes or new dependencies, as release notes omit migration details.
Final stable release in the 1.4 series before major version 1.5; pin dependencies now if you need stability.
- –Pin to v1.4.3 in production to avoid breaking changes expected in the upcoming 1.5 release.
- –Release notes do not specify breaking changes, new requirements, or migration steps for this version.
See how people are using fish-speech
Related Repositories
Discover similar tools and frameworks used by developers
LightRAG
Graph-based retrieval framework for structured RAG reasoning.
AI-Trader
LLM agent benchmarking framework for autonomous market trading.
whisper.cpp
Lightweight, cross-platform speech recognition engine delivering real-time transcription with minimal dependencies and optimized inference performance.
nanoGPT
Minimal PyTorch implementation for training GPT models.
openvino
Convert and deploy deep learning models across Intel hardware.