🐸TTS: Text-to-Speech deep learning toolkit
PyTorch toolkit for deep learning text-to-speech synthesis.
Learn more about TTS
🐸TTS is a PyTorch-based deep learning library for text-to-speech synthesis that implements multiple model architectures including Tacotron, Glow-TTS, and XTTS. The toolkit combines acoustic models for converting text to mel-spectrograms with vocoder models like HiFi-GAN and MelGAN for converting spectrograms to waveforms. It supports multi-speaker synthesis, voice cloning, voice conversion, and speaker encoding capabilities. The library is used in both research contexts and production deployments, with support for over 1100 languages through integration with Fairseq models.

Multi-architecture support
Implements various model architectures including Tacotron, Glow-TTS, XTTS, Tortoise, and Bark, allowing users to select approaches suited to their specific requirements. Integration with Fairseq models provides access to additional language coverage.
Voice cloning and conversion
Includes speaker encoder components and voice cloning capabilities that enable synthesis with new speaker characteristics. XTTS supports streaming inference with reported latency under 200ms.
Training and fine-tuning tools
Provides utilities for dataset analysis, curation, and model training from scratch or fine-tuning existing models. Example recipes are available for common datasets like LJSpeech.
from TTS.api import TTS
import torch
# Initialize TTS with a default English model
tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC", progress_bar=False)
# Text to convert to speech
text = "Hello world! This is a test of the TTS library."
# Generate speech and save to file
tts.tts_to_file(text=text, file_path="output.wav")
# You can also specify speaker if the model supports it
# tts.tts_to_file(text=text, speaker="speaker_01", file_path="output_speaker.wav")
print("Audio file generated successfully!")Related Repositories
Discover similar tools and frameworks used by developers
Wan2.1
Diffusion transformer models for text and image-to-video generation.
GFPGAN
PyTorch framework for blind face restoration using StyleGAN2 priors.
OpenPose
Multi-person 135-keypoint anatomical detection in C++.
GroundingDINO
Zero-shot object detection from text prompts.
YOLOX
PyTorch anchor-free object detector with scalable model variants.