The Voice Blender tool lets you fuse two independently trained RVCDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/IAHispano/Applio/llms.txt
Use this file to discover all available pages before exploring further.
.pth models into a single new model by linearly interpolating their weight tensors at a configurable ratio. The result is a model whose voice characteristics sit somewhere between the two source models — at a ratio of 1.0 the output is entirely Model 1, at 0.0 it is entirely Model 2, and 0.5 produces an equal mix. This is a quick way to create hybrid voices, smooth out weaknesses of one model with strengths of another, or explore the latent space between two similar voices without retraining from scratch.
Internally, the blending is performed in rvc/train/process/model_blender.py. Both checkpoints are loaded to CPU, their weight dictionaries are verified to share the same architecture and sample rate, and each tensor is blended as:
.pth file at logs/<model_name>.pth and can be used for inference immediately.
Requirements and Constraints
- Both source models must have identical architectures — the same set of weight keys. Attempting to blend incompatible models will return an error:
"Fail to merge the models. The model architectures are not the same." - Both source models must have the same sample rate (
srfield in the checkpoint). Mismatched sample rates are detected and reported before any blending occurs. - The vocoder type (
HiFi-GAN) is inherited from the first model.
Parameters
Name for the output blended model. The result is saved as
logs/<model_name>.pth.Path to the first source
.pth model file. At a ratio of 1.0, the output is entirely this model’s weights; at 0.0, none of its weights contribute.Path to the second source
.pth model file. At a ratio of 0.0, the output is entirely this model’s weights; at 1.0, none of its weights contribute.Blend ratio on a scale of 0.0 to 1.0, accepted in increments of 0.1 via the CLI.
1.0→ output is 100% Model 1 (weights fully frompth_path_1)0.5→ equal mix of both models0.0→ output is 100% Model 2 (weights fully frompth_path_2)
ratio × model_1 + (1 - ratio) × model_2. For the emb_g.weight (speaker embedding) tensor, if the two models have different numbers of speakers, only the minimum number of speaker rows are blended.CLI Example
Python API
Output Location
The blended model is saved directly tologs/<model_name>.pth (note: at the root of logs/, not inside a subdirectory). If you want to use it with the Applio UI alongside an index file, move or copy it into its own folder:
Blended models do not automatically get a
.index file because no feature extraction was run for the blend. For best inference quality, run the index generation step against the training data of whichever source model you blended more heavily toward, or simply use index_rate=0.0 and rely on the model weights alone.