Every RVC voice conversion begins by extracting a speaker-independent content representation from the input audio. This is the job of the embedder model — a self-supervised speech model (typically a HuBERT variant) that encodes the linguistic content of the source audio while suppressing speaker identity. The resulting embedding is then fed into the RVC synthesizer alongside the target speaker’s learned features from theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/IAHispano/Applio/llms.txt
Use this file to discover all available pages before exploring further.
.pth model file. Choosing the right embedder is critical: the embedder used at inference must match the one used when the model was trained, or the voice conversion quality will degrade noticeably.
Available Embedder Models
Applio ships with six built-in embedder options and the ability to load a custom model from disk.contentvec (default)
contentvec (default)
The original HuBERT-based content encoder used by the vast majority of publicly released RVC models.
contentvec was trained on English speech and generalises well across languages, making it the safe default for any model whose training embedder is unknown.Best for: General-purpose voice conversion; any model downloaded from the community.CLI value: contentvecspin
spin
SPIN (Self-supervised Pitch-Invariant Network) is an alternative speech representation model designed to be more robust to pitch variation than standard HuBERT models. It can improve naturalness for certain voices, particularly in singing scenarios.Best for: Models explicitly trained with the
spin embedder.CLI value: spinspin-v2
spin-v2
The second iteration of SPIN, with refined training and improved content disentanglement compared to the original. Use
spin-v2 when the target model was trained with spin-v2.Best for: Models explicitly trained with the spin-v2 embedder.CLI value: spin-v2chinese-hubert-base
chinese-hubert-base
A HuBERT model fine-tuned on Mandarin Chinese speech data. It captures phonetic distinctions in Chinese more accurately than
contentvec, which was primarily trained on English.Best for: Models trained on Mandarin Chinese speech data, or when converting Mandarin Chinese source audio into a Mandarin Chinese target voice.CLI value: chinese-hubert-basejapanese-hubert-base
japanese-hubert-base
A HuBERT model fine-tuned on Japanese speech data. Japanese phonology (morae, pitch accent, geminate consonants) benefits from a model specifically trained on Japanese audio.Best for: Models trained on Japanese speech.CLI value:
japanese-hubert-basekorean-hubert-base
korean-hubert-base
A HuBERT model fine-tuned on Korean speech data.Best for: Models trained on Korean speech.CLI value:
korean-hubert-basecustom
custom
Load any compatible HuBERT-architecture model from a local directory on disk. The custom model path is passed separately via
--embedder_model_custom.Best for: Experimental or proprietary embedder models not included in the Applio distribution.CLI value: customSee Custom Embedder Usage below for details.How to Choose the Right Embedder
Check the model's metadata
Run the Look for the
model_information command to inspect the training metadata embedded in the .pth file. This will tell you which embedder was used during training:embedder_model field in the output.Match the training embedder
Set
--embedder_model to exactly the same value that was used during training. If the metadata does not specify an embedder, assume contentvec — it is the historical default for RVC models.Custom Embedder Usage
To use a model that is not included in Applio’s built-in set:Place your model files
Copy your custom embedder’s model files into the custom embedder directory:The directory should contain the model weights and any required configuration files expected by the HuBERT loading code.