Documentation Index
Fetch the complete documentation index at: https://mintlify.com/shivammehta25/Matcha-TTS/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
This guide will help you start synthesizing speech with Matcha-TTS in minutes. We’ll cover three ways to use Matcha-TTS: the command-line interface (CLI), Python API, and web interface.Make sure you’ve installed Matcha-TTS before proceeding. Pre-trained models will be automatically downloaded on first use.
CLI Usage
The command-line interface is the fastest way to synthesize speech from text.Basic Synthesis
Synthesize a single utterance:utterance_001.wav in your current directory.
Synthesize from File
Create a text file with one sentence per line:Batch Processing
For faster processing of multiple sentences, use batch mode:Batched processing is significantly faster when synthesizing many sentences, especially on GPU.
CLI Parameters
Text to synthesize (alternative to —file)
Path to text file with one sentence per line
Model to use:
matcha_ljspeech (single speaker) or matcha_vctk (multi-speaker)Path to custom model checkpoint (optional)
Vocoder to use:
hifigan_T2_v1 or hifigan_univ_v1 (auto-selected based on model)Speaking rate control (higher = slower). Default: 0.95 for LJSpeech, 0.85 for VCTK
Sampling temperature for variation (higher = more variation)
Number of ODE solver steps (2-100). Fewer steps = faster but potentially lower quality
Speaker ID for multi-speaker models (0-107 for VCTK)
Directory to save output files (default: current directory)
Force CPU inference (default: use GPU if available)
Enable batch processing mode
Batch size for batch mode
Advanced CLI Examples
Python API
Use Matcha-TTS directly in your Python code for more control.Basic Python Example
Synthesis Function Parameters
Thesynthesise() method accepts the following parameters:
Batch of phoneme sequences. Shape:
(batch_size, max_text_length)Lengths of each sequence in the batch. Shape:
(batch_size,)Number of ODE solver steps (2-100)
Controls variance of terminal distribution
Speaker IDs for multi-speaker models. Shape:
(batch_size,)Controls speech pace (higher = slower)
Helper Functions
Multi-Speaker Example
Gradio Web Interface
Launch an interactive web interface for experimenting with Matcha-TTS:- Enter text and synthesize instantly
- Switch between single-speaker and multi-speaker models
- Adjust hyperparameters in real-time
- Select different speakers (for VCTK model)
- Listen to pre-cached examples
The Gradio app automatically downloads required models on first launch. The interface will be available at
http://localhost:7860 by default.Gradio Interface Code
The Gradio app implementation frommatcha/app.py:
Jupyter Notebook
Matcha-TTS includes a Jupyter notebook (synthesis.ipynb) for interactive experimentation:
Performance Tips
Choosing the right number of steps
Choosing the right number of steps
- 2-4 steps: Ultra-fast, slight quality reduction
- 10 steps (default): Good balance of speed and quality
- 50+ steps: Highest quality, diminishing returns beyond 50
GPU vs CPU
GPU vs CPU
GPU is highly recommended:
- GPU: RTF ~0.02 (50x real-time)
- CPU: RTF ~0.5-1.0 (1-2x real-time)
--cpu flag only if GPU is unavailable.Batch processing
Batch processing
For many utterances, use This can be 3-5x faster than processing individually.
--batched mode:Temperature and variation
Temperature and variation
- 0.333: Less variation, more consistent
- 0.667 (default): Natural variation
- 1.0+: More variation, potentially less stable
Output Format
Matcha-TTS generates:- Audio files:
.wavformat, 22050 Hz, PCM_24 - Mel-spectrograms:
.npyfiles (NumPy arrays) - Visualizations:
.pngspectrogram plots (when using CLI)
Next Steps
Training Custom Models
Learn how to train Matcha-TTS on your own dataset
ONNX Export
Export models to ONNX for deployment
API Reference
Detailed API documentation
Examples
More advanced usage examples