Once Applio is installed, performing your first voice conversion takes only a few minutes. You need two things before you start: a source audio file (the voice you want to transform) and an RVC model (the target voice you want to convert to). Applio can download models directly from URLs through its built-in Download tab, so you do not need to manage files manually. This guide walks through the complete process using both the Gradio web UI and the command-line interface.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/IAHispano/Applio/llms.txt
Use this file to discover all available pages before exploring further.
Model Files
Every RVC voice conversion requires two model files, both stored underlogs/<model_name>/ inside your Applio directory:
| File | Extension | Purpose |
|---|---|---|
| Model weights | .pth | The trained neural network that encodes the target speaker’s voice |
| Feature index | .index | A FAISS index of training embeddings used to refine the conversion |
.pth file, set --index_rate 0 to skip index retrieval.
Path 1: Web UI
The web UI is the recommended starting point for new users. It exposes all inference parameters through interactive controls and provides an audio player for immediate preview of your output.Start Applio
Launch Applio with the run script for your platform:Applio starts a local Gradio server and opens your default browser at
- Windows
- Linux / macOS
http://127.0.0.1:6969. Wait for the interface to fully load before proceeding — the first launch downloads prerequisite model files and may take a minute or two.Download a voice model
Navigate to the Download tab. Paste a direct URL to a
.pth file (or a .zip archive containing a .pth and .index pair) into the model link field, then click Download.Applio’s model_download_pipeline will fetch the files and place them in logs/<model_name>/ automatically. You can also upload model files manually to that directory.Open the Inference tab
Click the Inference tab. In the Model dropdown, select the model you just downloaded. The dropdown is populated from the files present in
logs/.Upload your audio and configure parameters
Upload your source audio file using the input audio widget. Then review the key conversion parameters:
| Parameter | Default | Description |
|---|---|---|
| Pitch | 0 | Semitone shift applied to the output pitch. Range: −24 to +24. Use +12 (one octave up) when converting a male voice to a female model, or −12 when doing the reverse. |
| F0 Method | rmvpe | Pitch extraction algorithm. rmvpe is the recommended default. Options: rmvpe, fcpe, crepe, crepe-tiny, and hybrid combinations such as hybrid[rmvpe+fcpe]. |
| Index Rate | 0.3 | Influence of the .index file on the output (0.0–1.0). Higher values produce a closer match to the training voice but may introduce artifacts. |
| Volume Envelope | 1.0 | Blending ratio for the output volume envelope. 1.0 uses the converted audio’s envelope directly. |
| Protect | 0.33 | Protects consonants and breath sounds from conversion artifacts (0.0–0.5). |
For singing voice conversions, enable F0 Autotune to snap the output pitch to the nearest chromatic note. Adjust Autotune Strength (0.0–1.0) to control how aggressively the pitch is quantized.
Convert and download your output
Click the Convert button. Applio will run the inference pipeline and display the output audio in the player below. Use the player controls to preview the result, then click the download icon to save the file.The output format defaults to WAV. To change it, select a different option from the Export Format dropdown before converting. Supported formats are:
WAV, MP3, FLAC, OGG, M4A.Path 2: CLI
Thecore.py CLI exposes the same inference pipeline as the web UI and is ideal for scripting, batch automation, or integrating Applio into larger workflows. Every parameter available in the UI has a corresponding CLI flag.
The minimal command for a single-file conversion is:
Common CLI Parameters
| Flag | Default | Description |
|---|---|---|
--input_path | (required) | Full path to the source audio file |
--output_path | (required) | Full path where the converted file will be saved |
--pth_path | (required) | Path to the .pth model weights file |
--index_path | (required) | Path to the .index FAISS index file |
--pitch | 0 | Semitone pitch shift (−24 to +24) |
--f0_method | rmvpe | Pitch extraction algorithm |
--index_rate | 0.3 | Index file influence (0.0–1.0) |
--volume_envelope | 1.0 | Output volume envelope blending |
--protect | 0.33 | Consonant protection level (0.0–0.5) |
--export_format | WAV | Output format: WAV, MP3, FLAC, OGG, M4A |
--embedder_model | contentvec | Speaker embedding model |
--split_audio | False | Split long audio into segments before inference |
--clean_audio | False | Apply noise reduction to the output |
--f0_autotune | False | Snap output pitch to the chromatic grid |
Batch Inference
To convert an entire folder of audio files in one call, use thebatch_infer subcommand:
TTS-to-RVC Pipeline
You can synthesize text withedge-tts and immediately pipe the result through RVC conversion in a single command:
Output Format Reference
Applio supports the following output containers for all inference modes:| Format | Flag value | Notes |
|---|---|---|
| WAV | WAV | Lossless, uncompressed. Default output format. |
| MP3 | MP3 | Lossy, widely compatible. Good for sharing. |
| FLAC | FLAC | Lossless, compressed. Recommended for archiving. |
| OGG | OGG | Lossy, open format. |
| M4A | M4A | Lossy, AAC-based. Good for Apple ecosystem compatibility. |
Next Steps
You have completed your first voice conversion. From here you can explore the full set of inference parameters, train your own RVC model, or extend Applio with plugins.Full Inference Reference
Complete documentation for every inference parameter, post-processing effect, and formant-shifting option.
Training a Model
Learn how to preprocess a dataset, extract features, and train a custom RVC voice model from scratch.