Applio exposes its entire voice-conversion pipeline through a single entry-point script,Documentation Index
Fetch the complete documentation index at: https://mintlify.com/IAHispano/Applio/llms.txt
Use this file to discover all available pages before exploring further.
core.py, which uses Python’s built-in argparse library. Every feature available in the Gradio web UI — inference, batch processing, training, text-to-speech, model blending, and more — can be driven from the command line without launching a browser. This makes core.py ideal for automation scripts, CI/CD pipelines, cloud batch jobs, and any headless environment where a graphical interface is not practical.
Basic usage
The CLI follows the standard subcommand pattern. Run the script with a subcommand name, followed by the flags specific to that subcommand:--help:
Environment setup
All commands must be run from the Applio root directory (the folder that containscore.py) with the project’s virtual environment activated. Running from a different working directory or without the virtual environment will cause import errors.
If you have not yet installed Applio’s dependencies, run
python core.py prerequisites first. This downloads the required pretrained models and executables into the correct locations.Available subcommands
infer
Convert a single audio file to a target voice using a trained RVC model.
batch_infer
Convert an entire folder of audio files in one pass using the same model and settings.
tts
Synthesize text with Edge TTS, then immediately apply RVC voice conversion — all in one command.
preprocess
Slice and clean a raw audio dataset to prepare it for feature extraction and training.
extract
Extract pitch (F0) and speaker embeddings from a preprocessed dataset.
train
Train an RVC model from extracted features. Automatically runs index generation on completion.
index
Manually (re-)generate the FAISS index file for a trained model.
model_information
Print metadata stored inside a
.pth model file — architecture, epoch, sample rate, and more.model_blender
Fuse two RVC models together at a configurable blend ratio to create a hybrid voice.
tensorboard
Launch a TensorBoard server pointed at the
logs/ directory to monitor training progress.download
Download and extract a model archive from a direct URL or Hugging Face link into
logs/.prerequisites
Download pretrained models, additional support models (RMVPE, etc.), and required executables.
audio_analyzer
Analyze an audio file and print sample rate, duration, bit depth, channel count, and RMS level.
Quick reference table
| Subcommand | Purpose | Required flags |
|---|---|---|
infer | Single-file voice conversion | --input_path, --output_path, --pth_path, --index_path |
batch_infer | Folder voice conversion | --input_folder, --output_folder, --pth_path, --index_path |
tts | TTS + voice conversion | --tts_text/--tts_file, --tts_voice, --output_tts_path, --output_rvc_path, --pth_path, --index_path |
preprocess | Dataset preprocessing | --model_name, --dataset_path, --sample_rate, --cut_preprocess |
extract | Feature extraction | --model_name, --sample_rate, --include_mutes |
train | Model training | --model_name, --save_every_epoch, --sample_rate |
index | Index generation | --model_name |
model_information | Model metadata | --pth_path |
model_blender | Model fusion | --model_name, --pth_path_1, --pth_path_2 |
tensorboard | TensorBoard server | (none) |
download | Model download | --model_link |
prerequisites | Dependency setup | (none required) |
audio_analyzer | Audio analysis | --input_path |