Post-Processing Effects for Applio Voice Conversion

Applio includes a built-in audio effects chain that runs immediately after voice conversion, before the output file is written to disk. These effects are powered by Spotify’s pedalboard library, a high-quality Python audio effects processor. Every effect is optional and independently toggled — you can chain as many or as few as you need. Effects are applied in the order they appear in the pipeline: reverb → pitch shift → limiter → gain → distortion → chorus → bitcrush → clipping → compressor → delay.

Enabling Post-Processing

Post-processing is disabled by default. To activate it, set post_process=True in either the Python API or the CLI. Each individual effect also requires its own flag (e.g. reverb=True). Setting post_process=True without enabling any individual effect flag is a no-op.

Post-processing is applied to the full converted audio signal. If you use split_audio=True, the effects are applied after the segments are merged back together.

Available Effects

Reverb

Simulates the acoustic reflections of a physical space. Higher room sizes produce longer, more diffuse tails; damping controls how quickly high frequencies decay.Enable with reverb=True.

reverb_room_size

float

default:"0.5"

Size of the simulated room. Range: 0.0 (dry, tight) to 1.0 (large hall). Higher values produce longer reverb tails.

reverb_damping

float

default:"0.5"

High-frequency damping applied to the reverb tail. Range: 0.0 to 1.0. Higher values make the tail sound darker and decay faster.

reverb_wet_gain

float

default:"0.5"

Level of the wet (reverberated) signal in the mix. Range: 0.0 to 1.0.

reverb_dry_gain

float

default:"0.5"

Level of the dry (original) signal in the mix. Range: 0.0 to 1.0. Set to 0.0 for a fully wet signal.

reverb_width

float

default:"0.5"

Stereo width of the reverb effect. Range: 0.0 (mono) to 1.0 (full stereo spread).

reverb_freeze_mode

float

default:"0.5"

Controls infinite reverb sustain. Range: 0.0 (normal decay) to 1.0 (reverb sustains indefinitely without decaying).

Pitch Shift

Transposes the pitch of the output audio by a fixed number of semitones, independent of the model’s pitch parameter. Useful for fine-tuning the final pitch of the converted audio without re-running inference.Enable with pitch_shift=True.

pitch_shift_semitones

float

default:"0.0"

Number of semitones to shift the pitch. Positive values raise the pitch; negative values lower it. For example, 12.0 shifts up one octave.

Limiter

A brickwall limiter that prevents the output from exceeding a set threshold in decibels. Essential for preventing clipping when using effects that increase gain (e.g. reverb, distortion, gain).Enable with limiter=True.

limiter_threshold

float

default:"-6"

Maximum output level in dBFS. Anything above this value is transparently attenuated. A value of -6 gives 6 dB of headroom below 0 dBFS.

limiter_release_time

float

default:"0.01"

Release time in seconds. Controls how quickly the limiter stops attenuating after the signal falls below the threshold. Shorter values sound more responsive; longer values sound smoother.

Gain

Applies a fixed gain (amplification or attenuation) in decibels to the output signal. Use in combination with the limiter to boost quiet conversions safely.Enable with gain=True.

gain_db

float

default:"0.0"

Gain to apply in decibels. Positive values amplify; negative values attenuate. For example, 6.0 doubles the amplitude, -6.0 halves it.

Distortion

Applies soft-clipping harmonic distortion to the signal. Adds grit and harmonic overtones. At high drive values the output will be heavily saturated.Enable with distortion=True.

distortion_gain

float

default:"25"

Drive amount in dB. Higher values produce heavier distortion. A value around 10–20 gives mild saturation; values above 30 produce heavy clipping.

Chorus

Creates a thick, doubling effect by layering slightly pitch-modulated and time-delayed copies of the signal. Useful for thickening vocal conversions.Enable with chorus=True.

chorus_rate

float

default:"1.0"

Modulation rate in Hz. Controls how fast the pitch oscillates. Lower values create a slow, wide sweep; higher values create a rapid vibrato-like effect.

chorus_depth

float

default:"0.25"

Modulation depth as a fraction (0.0–1.0). Higher values create a more pronounced pitch variation between the original and chorus voices.

chorus_center_delay

float

default:"7"

Centre delay time in milliseconds. The average time offset of the chorus copies from the dry signal.

chorus_feedback

float

default:"0.0"

Amount of the chorus output fed back into the input (0.0–1.0). Higher values create a more resonant, metallic effect.

chorus_mix

float

default:"0.5"

Wet/dry mix ratio. 0.0 is fully dry; 1.0 is fully wet (chorus only).

Bitcrush

Reduces the bit depth of the audio to simulate the lo-fi sound of vintage digital samplers or game audio hardware.Enable with bitcrush=True.

bitcrush_bit_depth

int

default:"8"

Target bit depth. Lower values produce more quantization noise and a grittier sound. 8 gives a classic 8-bit character; 4 is extremely degraded.

Clipping

Hard-clips the audio at a threshold, producing harsh distortion. Unlike the limiter (which attenuates transparently), clipping introduces audible harmonic distortion at the threshold boundary.Enable with clipping=True.

clipping_threshold

float

default:"-6"

Threshold in dBFS at which hard clipping occurs. Any sample exceeding this level is clipped flat. Values closer to 0 allow more signal through before clipping.

Compressor

Reduces the dynamic range of the signal by attenuating loud passages. Useful for evening out volume inconsistencies in the converted voice.Enable with compressor=True.

compressor_threshold

float

default:"0"

Level in dBFS above which compression is applied. Signals below this level pass through unaffected.

compressor_ratio

float

default:"1"

Compression ratio (e.g. 4 means 4:1 — for every 4 dB above the threshold, only 1 dB comes through). A ratio of 1 means no compression.

compressor_attack

float

default:"1.0"

Attack time in milliseconds. Controls how quickly the compressor reacts when the signal exceeds the threshold. Shorter values clamp transients harder.

compressor_release

float

default:"100"

Release time in milliseconds. Controls how quickly the compressor stops compressing after the signal falls below the threshold.

Delay

Adds an echo effect by mixing a time-delayed copy of the signal back into the output. The feedback parameter controls how many echo repeats you hear.Enable with delay=True.

delay_seconds

float

default:"0.5"

Delay time in seconds. For example, 0.5 produces an echo half a second after the original signal.

delay_feedback

float

default:"0.0"

Amount of the delayed signal fed back into the delay line (0.0–1.0). Higher values create multiple repeating echoes. Values approaching 1.0 may cause runaway feedback — use with caution.

delay_mix

float

default:"0.5"

Wet/dry mix ratio. 0.0 is the dry signal only; 1.0 is the delayed signal only.

CLI Example

The following example applies reverb and a compressor to the output. All effect flags must be explicitly set to True:

python core.py infer \
  --input_path audio/input.wav \
  --output_path audio/output.wav \
  --pth_path logs/MyModel/MyModel.pth \
  --index_path logs/MyModel/MyModel.index \
  --post_process True \
  --reverb True \
  --reverb_room_size 0.6 \
  --reverb_damping 0.4 \
  --reverb_wet_gain 0.4 \
  --reverb_dry_gain 0.6 \
  --reverb_width 0.8 \
  --reverb_freeze_mode 0.0 \
  --compressor True \
  --compressor_threshold -18 \
  --compressor_ratio 4 \
  --compressor_attack 5.0 \
  --compressor_release 150

Python API Example

from core import run_infer_script

message, output_path = run_infer_script(
    pitch=0,
    index_rate=0.3,
    volume_envelope=1.0,
    protect=0.33,
    f0_method="rmvpe",
    input_path="audio/input.wav",
    output_path="audio/output.wav",
    pth_path="logs/MyModel/MyModel.pth",
    index_path="logs/MyModel/MyModel.index",
    split_audio=False,
    f0_autotune=False,
    f0_autotune_strength=1.0,
    proposed_pitch=False,
    proposed_pitch_threshold=155.0,
    clean_audio=False,
    clean_strength=0.7,
    export_format="WAV",
    embedder_model="contentvec",
    # --- post-processing ---
    post_process=True,
    reverb=True,
    reverb_room_size=0.6,
    reverb_damping=0.4,
    reverb_wet_gain=0.4,
    reverb_dry_gain=0.6,
    reverb_width=0.8,
    reverb_freeze_mode=0.0,
    compressor=True,
    compressor_threshold=-18,
    compressor_ratio=4,
    compressor_attack=5.0,
    compressor_release=150,
    limiter=True,
    limiter_threshold=-1.0,
    limiter_release_time=0.05,
)

print(message)       # "File audio/input.wav inferred successfully."
print(output_path)   # "audio/output.wav"

Stack a compressor followed by a limiter for broadcast-safe output: the compressor tames the dynamics, and the limiter ensures the final peak never exceeds your target ceiling.

Effects are applied in a fixed sequence regardless of the order you specify the flags. If you enable distortion and then a limiter, the limiter will catch any clipping the distortion introduces. Be mindful of gain staging — combining multiple gain-increasing effects without a limiter can produce severely clipped output.

Get Started

Core Features

Advanced Usage

Deployment

Post-Processing Effects for Applio Voice Conversion

Enabling Post-Processing

Available Effects

CLI Example

Python API Example

Build docs developers (and LLMs) love

Get Started

Core Features

Advanced Usage

Deployment

Documentation Index

​Enabling Post-Processing

​Available Effects

​CLI Example

​Python API Example

Build docs developers (and LLMs) love

Enabling Post-Processing

Available Effects

CLI Example

Python API Example