Voice Synthesis

MoneyPrinterTurbo converts your video script into spoken audio using one of several TTS providers. You can preview voices in real time in the Web UI before committing to a full render.

Supported providers

Edge TTS
Azure Speech
SiliconFlow
Gemini TTS

Built-in, free, no API key required.Edge TTS uses Microsoft’s streaming speech API via the edge-tts Python library. It supports hundreds of voices across dozens of languages and is the default provider.Voice names follow the pattern {locale}-{Name}Neural, for example en-US-AriaNeural or zh-CN-XiaoxiaoNeural.

Edge TTS is recommended for most users. It works out of the box with no additional setup and covers both Chinese and English with high-quality neural voices.

Higher audio quality, requires API key.Azure voices are marked with a -V2 suffix in their names (e.g., en-US-AvaMultilingualNeural-V2). They sound more natural than standard Edge voices and support multilingual speakers.Configure your credentials in config.toml:

[azure]
speech_key = "YOUR_AZURE_SPEECH_KEY"
speech_region = "eastus"

Azure V2 voices use a separate synthesis path (azure_tts_v2) that streams audio at 48 kHz / 192 kbps MP3 quality.

Without a valid speech_key and speech_region in config, Azure voice synthesis will fail silently. The task will be marked as failed with an error in the logs.

CosyVoice2 model, requires API key.SiliconFlow provides voices powered by the FunAudioLLM/CosyVoice2-0.5B model. Available voices: alex, benjamin, charles, david (Male), anna, bella, claire, diana (Female).Voice names use the format siliconflow:FunAudioLLM/CosyVoice2-0.5B:alex-Male.Configure your API key in config.toml:

[siliconflow]
api_key = "YOUR_SILICONFLOW_API_KEY"

Google Gemini 2.5 Flash, requires API key.Gemini TTS offers 15 expressive voices. Voice names use the format gemini:Zephyr-Female.Available voices: Zephyr, Puck, Charon, Kore, Fenrir, Aoede, Thalia, Sage, Echo, Harmony, Lux, Nova, Vale, Orion, Atlas.Configure your API key in config.toml under the [app] section:

[app]
gemini_api_key = "YOUR_GEMINI_API_KEY"

Gemini TTS requires pydub to be installed (pip install pydub). Audio is returned as Linear PCM and converted to MP3 automatically.

Voice selection

The voice_name parameter identifies the voice to use. The full naming convention is:

{locale}-{VoiceName}Neural-{Gender}

Examples:

Voice name	Locale	Gender
`zh-CN-XiaoxiaoNeural-Female`	Mandarin (China)	Female
`zh-CN-YunyangNeural-Male`	Mandarin (China)	Male
`en-US-AriaNeural-Female`	English (US)	Female
`en-GB-RyanNeural-Male`	English (UK)	Male
`en-US-AvaMultilingualNeural-V2-Female`	English (US, Azure V2)	Female

The gender suffix (-Female, -Male) is stripped internally before the API call — it exists only to help identify voices in the UI and the voice list file. To see every available voice, open docs/voice-list.txt in the repository. The file lists all Edge TTS and Azure voices with their locale and gender.

Previewing voices in the Web UI

In the Web UI, select a voice from the dropdown and click the Preview button. A short sample is synthesized and played back immediately so you can compare voices before generating the full video.

Parameter reference

voice_name

string

default:"zh-CN-XiaoxiaoNeural-Female"

The voice to use for narration. See the tables above and docs/voice-list.txt for the full list of options. The value must exactly match the names shown in the voice list, including the gender suffix.

voice_volume

float

default:"1.0"

Volume multiplier for the narration track, from 0.0 (silent) to 1.0 (full volume). Values above 1.0 may clip on some audio systems.This controls the narration layer only. Background music volume is set separately with bgm_volume.

voice_rate

float

default:"1.0"

Speech speed multiplier. 1.0 is normal speed. 1.2 is 20% faster. 0.8 is 20% slower.Internally this is converted to a percentage string (e.g., +20%, -20%) compatible with Edge TTS and SiliconFlow APIs. For Gemini TTS, rate adjustment is noted but currently not applied.

Custom audio

To bypass TTS entirely, supply a pre-recorded MP3 file with custom_audio_file:

{
  "video_subject": "My product demo",
  "custom_audio_file": "/absolute/path/to/narration.mp3"
}

When custom_audio_file is set and the file exists, the TTS step is skipped. The pipeline reads the file duration directly and uses it to size the video. Subtitles are disabled in this mode because word-boundary timing data is not available from a pre-recorded file.

If custom_audio_file is set but the path does not exist on disk, MoneyPrinterTurbo falls back to TTS automatically and logs a warning.

Language matching

Make sure the voice locale matches the language of your script. A Chinese voice (zh-CN-*) will not produce natural output with an English script, and vice versa. For multilingual content, use an Azure V2 multilingual voice such as en-US-AndrewMultilingualNeural-V2-Male or zh-CN-XiaoxiaoMultilingualNeural-V2-Female.

Get Started

Configuration

Features

Troubleshooting

Supported providers

Voice selection

Previewing voices in the Web UI

Parameter reference

Custom audio

Language matching

Build docs developers (and LLMs) love

Get Started

Configuration

Features

Troubleshooting

Documentation Index

​Supported providers

​Voice selection

​Previewing voices in the Web UI

​Parameter reference

​Custom audio

​Language matching

Build docs developers (and LLMs) love

Supported providers

Voice selection

Previewing voices in the Web UI

Parameter reference

Custom audio

Language matching