Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/IAHispano/Applio/llms.txt

Use this file to discover all available pages before exploring further.

Applio includes a built-in audio effects chain that runs immediately after voice conversion, before the output file is written to disk. These effects are powered by Spotify’s pedalboard library, a high-quality Python audio effects processor. Every effect is optional and independently toggled — you can chain as many or as few as you need. Effects are applied in the order they appear in the pipeline: reverb → pitch shift → limiter → gain → distortion → chorus → bitcrush → clipping → compressor → delay.

Enabling Post-Processing

Post-processing is disabled by default. To activate it, set post_process=True in either the Python API or the CLI. Each individual effect also requires its own flag (e.g. reverb=True). Setting post_process=True without enabling any individual effect flag is a no-op.
Post-processing is applied to the full converted audio signal. If you use split_audio=True, the effects are applied after the segments are merged back together.

Available Effects

Simulates the acoustic reflections of a physical space. Higher room sizes produce longer, more diffuse tails; damping controls how quickly high frequencies decay.Enable with reverb=True.
reverb_room_size
float
default:"0.5"
Size of the simulated room. Range: 0.0 (dry, tight) to 1.0 (large hall). Higher values produce longer reverb tails.
reverb_damping
float
default:"0.5"
High-frequency damping applied to the reverb tail. Range: 0.0 to 1.0. Higher values make the tail sound darker and decay faster.
reverb_wet_gain
float
default:"0.5"
Level of the wet (reverberated) signal in the mix. Range: 0.0 to 1.0.
reverb_dry_gain
float
default:"0.5"
Level of the dry (original) signal in the mix. Range: 0.0 to 1.0. Set to 0.0 for a fully wet signal.
reverb_width
float
default:"0.5"
Stereo width of the reverb effect. Range: 0.0 (mono) to 1.0 (full stereo spread).
reverb_freeze_mode
float
default:"0.5"
Controls infinite reverb sustain. Range: 0.0 (normal decay) to 1.0 (reverb sustains indefinitely without decaying).
Transposes the pitch of the output audio by a fixed number of semitones, independent of the model’s pitch parameter. Useful for fine-tuning the final pitch of the converted audio without re-running inference.Enable with pitch_shift=True.
pitch_shift_semitones
float
default:"0.0"
Number of semitones to shift the pitch. Positive values raise the pitch; negative values lower it. For example, 12.0 shifts up one octave.
A brickwall limiter that prevents the output from exceeding a set threshold in decibels. Essential for preventing clipping when using effects that increase gain (e.g. reverb, distortion, gain).Enable with limiter=True.
limiter_threshold
float
default:"-6"
Maximum output level in dBFS. Anything above this value is transparently attenuated. A value of -6 gives 6 dB of headroom below 0 dBFS.
limiter_release_time
float
default:"0.01"
Release time in seconds. Controls how quickly the limiter stops attenuating after the signal falls below the threshold. Shorter values sound more responsive; longer values sound smoother.
Applies a fixed gain (amplification or attenuation) in decibels to the output signal. Use in combination with the limiter to boost quiet conversions safely.Enable with gain=True.
gain_db
float
default:"0.0"
Gain to apply in decibels. Positive values amplify; negative values attenuate. For example, 6.0 doubles the amplitude, -6.0 halves it.
Applies soft-clipping harmonic distortion to the signal. Adds grit and harmonic overtones. At high drive values the output will be heavily saturated.Enable with distortion=True.
distortion_gain
float
default:"25"
Drive amount in dB. Higher values produce heavier distortion. A value around 1020 gives mild saturation; values above 30 produce heavy clipping.
Creates a thick, doubling effect by layering slightly pitch-modulated and time-delayed copies of the signal. Useful for thickening vocal conversions.Enable with chorus=True.
chorus_rate
float
default:"1.0"
Modulation rate in Hz. Controls how fast the pitch oscillates. Lower values create a slow, wide sweep; higher values create a rapid vibrato-like effect.
chorus_depth
float
default:"0.25"
Modulation depth as a fraction (0.01.0). Higher values create a more pronounced pitch variation between the original and chorus voices.
chorus_center_delay
float
default:"7"
Centre delay time in milliseconds. The average time offset of the chorus copies from the dry signal.
chorus_feedback
float
default:"0.0"
Amount of the chorus output fed back into the input (0.01.0). Higher values create a more resonant, metallic effect.
chorus_mix
float
default:"0.5"
Wet/dry mix ratio. 0.0 is fully dry; 1.0 is fully wet (chorus only).
Reduces the bit depth of the audio to simulate the lo-fi sound of vintage digital samplers or game audio hardware.Enable with bitcrush=True.
bitcrush_bit_depth
int
default:"8"
Target bit depth. Lower values produce more quantization noise and a grittier sound. 8 gives a classic 8-bit character; 4 is extremely degraded.
Hard-clips the audio at a threshold, producing harsh distortion. Unlike the limiter (which attenuates transparently), clipping introduces audible harmonic distortion at the threshold boundary.Enable with clipping=True.
clipping_threshold
float
default:"-6"
Threshold in dBFS at which hard clipping occurs. Any sample exceeding this level is clipped flat. Values closer to 0 allow more signal through before clipping.
Reduces the dynamic range of the signal by attenuating loud passages. Useful for evening out volume inconsistencies in the converted voice.Enable with compressor=True.
compressor_threshold
float
default:"0"
Level in dBFS above which compression is applied. Signals below this level pass through unaffected.
compressor_ratio
float
default:"1"
Compression ratio (e.g. 4 means 4:1 — for every 4 dB above the threshold, only 1 dB comes through). A ratio of 1 means no compression.
compressor_attack
float
default:"1.0"
Attack time in milliseconds. Controls how quickly the compressor reacts when the signal exceeds the threshold. Shorter values clamp transients harder.
compressor_release
float
default:"100"
Release time in milliseconds. Controls how quickly the compressor stops compressing after the signal falls below the threshold.
Adds an echo effect by mixing a time-delayed copy of the signal back into the output. The feedback parameter controls how many echo repeats you hear.Enable with delay=True.
delay_seconds
float
default:"0.5"
Delay time in seconds. For example, 0.5 produces an echo half a second after the original signal.
delay_feedback
float
default:"0.0"
Amount of the delayed signal fed back into the delay line (0.01.0). Higher values create multiple repeating echoes. Values approaching 1.0 may cause runaway feedback — use with caution.
delay_mix
float
default:"0.5"
Wet/dry mix ratio. 0.0 is the dry signal only; 1.0 is the delayed signal only.

CLI Example

The following example applies reverb and a compressor to the output. All effect flags must be explicitly set to True:
python core.py infer \
  --input_path audio/input.wav \
  --output_path audio/output.wav \
  --pth_path logs/MyModel/MyModel.pth \
  --index_path logs/MyModel/MyModel.index \
  --post_process True \
  --reverb True \
  --reverb_room_size 0.6 \
  --reverb_damping 0.4 \
  --reverb_wet_gain 0.4 \
  --reverb_dry_gain 0.6 \
  --reverb_width 0.8 \
  --reverb_freeze_mode 0.0 \
  --compressor True \
  --compressor_threshold -18 \
  --compressor_ratio 4 \
  --compressor_attack 5.0 \
  --compressor_release 150

Python API Example

from core import run_infer_script

message, output_path = run_infer_script(
    pitch=0,
    index_rate=0.3,
    volume_envelope=1.0,
    protect=0.33,
    f0_method="rmvpe",
    input_path="audio/input.wav",
    output_path="audio/output.wav",
    pth_path="logs/MyModel/MyModel.pth",
    index_path="logs/MyModel/MyModel.index",
    split_audio=False,
    f0_autotune=False,
    f0_autotune_strength=1.0,
    proposed_pitch=False,
    proposed_pitch_threshold=155.0,
    clean_audio=False,
    clean_strength=0.7,
    export_format="WAV",
    embedder_model="contentvec",
    # --- post-processing ---
    post_process=True,
    reverb=True,
    reverb_room_size=0.6,
    reverb_damping=0.4,
    reverb_wet_gain=0.4,
    reverb_dry_gain=0.6,
    reverb_width=0.8,
    reverb_freeze_mode=0.0,
    compressor=True,
    compressor_threshold=-18,
    compressor_ratio=4,
    compressor_attack=5.0,
    compressor_release=150,
    limiter=True,
    limiter_threshold=-1.0,
    limiter_release_time=0.05,
)

print(message)       # "File audio/input.wav inferred successfully."
print(output_path)   # "audio/output.wav"
Stack a compressor followed by a limiter for broadcast-safe output: the compressor tames the dynamics, and the limiter ensures the final peak never exceeds your target ceiling.
Effects are applied in a fixed sequence regardless of the order you specify the flags. If you enable distortion and then a limiter, the limiter will catch any clipping the distortion introduces. Be mindful of gain staging — combining multiple gain-increasing effects without a limiter can produce severely clipped output.

Build docs developers (and LLMs) love