TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/IAHispano/Applio/llms.txt
Use this file to discover all available pages before exploring further.
tts subcommand combines two operations into a single CLI call: it first synthesizes speech from text using Microsoft Edge TTS, saving the raw synthesized audio to --output_tts_path, and then immediately passes that audio through Applio’s RVC voice-conversion pipeline to produce a final converted output at --output_rvc_path. This means you can go from a plain text string to a fully converted voice clip — with all the quality controls available in normal inference — without any manual intermediate steps.
Two-stage pipeline
Text-to-speech synthesis
Edge TTS synthesizes the input text using the chosen voice (
--tts_voice) at the specified rate (--tts_rate) and saves the raw audio to --output_tts_path.--tts_text and --tts_file are both listed as required by the argument parser; provide the text directly in --tts_text and supply the path to a .txt file in --tts_file. If you only have one source, supply the same value to both or an empty string for the one you do not use — the underlying TTS script decides which to prefer at runtime.Flags
Text input
The text string to synthesize. Enclose in quotes when passing on the command line. For long or multi-line content, use
--tts_file instead.Path to a plain-text file whose contents will be synthesized. Use this for long scripts or multi-line text.
Voice and rate
Edge TTS voice short name to use for synthesis (e.g.,
en-US-AriaNeural, en-GB-SoniaNeural, es-ES-ElviraNeural). The full list of available voices is loaded from rvc/lib/tools/tts_voices.json; the relevant field is ShortName.Speaking rate adjustment. Range:
-100 (much slower) to 100 (much faster). A value of 0 uses the voice’s natural speaking rate.Output paths
Full path where the raw Edge TTS audio will be saved before voice conversion. The file will be overwritten if it already exists inside the
assets/ directory.Full path where the final RVC-converted audio will be saved.
Model paths
Full path to the trained RVC model file (
.pth).Full path to the FAISS index file (
.index) that accompanies the model.Voice conversion settings
Pitch shift in semitones applied during RVC conversion. Range:
-24 to 24. Useful for adapting a model trained on a different pitch register to the TTS voice’s natural pitch range.Influence of the FAISS index on the conversion output. Range:
0.0–1.0. Higher values bring the output closer to the training voice; lower values reduce artifacts.Blends the output’s volume envelope with the input’s. Range:
0.0–1.0. A value of 1.0 uses the output envelope fully.Protects consonants and breathing sounds from conversion artifacts. Range:
0.0–0.5.Pitch-extraction algorithm for RVC conversion. Choices:
crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe]. rmvpe is recommended for most TTS voices.Split the synthesized TTS audio into smaller segments before RVC conversion. Recommended for long TTS outputs. Accepts
True or False.Apply a light autotune to the RVC-converted output. Can help with TTS-to-singing use cases. Accepts
True or False.Strength of the autotune snap to the chromatic grid. Range:
0.0–1.0. Only active when --f0_autotune True.Enable proposed pitch adjustment mode during conversion. Accepts
True or False.Threshold frequency (Hz) for proposed pitch adjustment. Range:
100–499. Only active when --proposed_pitch True.Run noise reduction on the converted output. Recommended for cleaner TTS results. Accepts
True or False.Intensity of the noise-reduction pass. Range:
0.0–1.0. Only active when --clean_audio True.Output file format for the final converted audio. Choices:
WAV, MP3, FLAC, OGG, M4A.Speaker-embedding model used during RVC conversion. Choices:
contentvec, spin, spin-v2, chinese-hubert-base, japanese-hubert-base, korean-hubert-base, custom.Path to a custom embedding model. Only used when
--embedder_model custom.Usage example
Finding available voices
Applio loads its voice list fromrvc/lib/tools/tts_voices.json. Each entry in the file represents one Edge TTS voice. The value you pass to --tts_voice must match the ShortName field exactly, including capitalization.
| Short name | Language | Gender |
|---|---|---|
en-US-AriaNeural | English (US) | Female |
en-US-JennyNeural | English (US) | Female |
en-US-GuyNeural | English (US) | Male |
en-GB-SoniaNeural | English (UK) | Female |
es-ES-ElviraNeural | Spanish (Spain) | Female |
fr-FR-DeniseNeural | French | Female |
de-DE-KatjaNeural | German | Female |
ja-JP-NanamiNeural | Japanese | Female |
zh-CN-XiaoxiaoNeural | Chinese (Mandarin) | Female |