Supported providers
- Edge TTS
- Azure Speech
- SiliconFlow
- Gemini TTS
Built-in, free, no API key required.Edge TTS uses Microsoft’s streaming speech API via the
edge-tts Python library. It supports hundreds of voices across dozens of languages and is the default provider.Voice names follow the pattern {locale}-{Name}Neural, for example en-US-AriaNeural or zh-CN-XiaoxiaoNeural.Voice selection
Thevoice_name parameter identifies the voice to use. The full naming convention is:
| Voice name | Locale | Gender |
|---|---|---|
zh-CN-XiaoxiaoNeural-Female | Mandarin (China) | Female |
zh-CN-YunyangNeural-Male | Mandarin (China) | Male |
en-US-AriaNeural-Female | English (US) | Female |
en-GB-RyanNeural-Male | English (UK) | Male |
en-US-AvaMultilingualNeural-V2-Female | English (US, Azure V2) | Female |
-Female, -Male) is stripped internally before the API call — it exists only to help identify voices in the UI and the voice list file.
To see every available voice, open docs/voice-list.txt in the repository. The file lists all Edge TTS and Azure voices with their locale and gender.
Previewing voices in the Web UI
In the Web UI, select a voice from the dropdown and click the Preview button. A short sample is synthesized and played back immediately so you can compare voices before generating the full video.Parameter reference
The voice to use for narration. See the tables above and
docs/voice-list.txt for the full list of options. The value must exactly match the names shown in the voice list, including the gender suffix.Volume multiplier for the narration track, from
0.0 (silent) to 1.0 (full volume). Values above 1.0 may clip on some audio systems.This controls the narration layer only. Background music volume is set separately with bgm_volume.Speech speed multiplier.
1.0 is normal speed. 1.2 is 20% faster. 0.8 is 20% slower.Internally this is converted to a percentage string (e.g., +20%, -20%) compatible with Edge TTS and SiliconFlow APIs. For Gemini TTS, rate adjustment is noted but currently not applied.Custom audio
To bypass TTS entirely, supply a pre-recorded MP3 file withcustom_audio_file:
custom_audio_file is set and the file exists, the TTS step is skipped. The pipeline reads the file duration directly and uses it to size the video. Subtitles are disabled in this mode because word-boundary timing data is not available from a pre-recorded file.
If
custom_audio_file is set but the path does not exist on disk, MoneyPrinterTurbo falls back to TTS automatically and logs a warning.Language matching
Make sure the voice locale matches the language of your script. A Chinese voice (zh-CN-*) will not produce natural output with an English script, and vice versa.
For multilingual content, use an Azure V2 multilingual voice such as en-US-AndrewMultilingualNeural-V2-Male or zh-CN-XiaoxiaoMultilingualNeural-V2-Female.