Skip to main content
MoneyPrinterTurbo converts your video script into spoken audio using one of several TTS providers. You can preview voices in real time in the Web UI before committing to a full render.

Supported providers

Built-in, free, no API key required.Edge TTS uses Microsoft’s streaming speech API via the edge-tts Python library. It supports hundreds of voices across dozens of languages and is the default provider.Voice names follow the pattern {locale}-{Name}Neural, for example en-US-AriaNeural or zh-CN-XiaoxiaoNeural.
Edge TTS is recommended for most users. It works out of the box with no additional setup and covers both Chinese and English with high-quality neural voices.

Voice selection

The voice_name parameter identifies the voice to use. The full naming convention is:
{locale}-{VoiceName}Neural-{Gender}
Examples:
Voice nameLocaleGender
zh-CN-XiaoxiaoNeural-FemaleMandarin (China)Female
zh-CN-YunyangNeural-MaleMandarin (China)Male
en-US-AriaNeural-FemaleEnglish (US)Female
en-GB-RyanNeural-MaleEnglish (UK)Male
en-US-AvaMultilingualNeural-V2-FemaleEnglish (US, Azure V2)Female
The gender suffix (-Female, -Male) is stripped internally before the API call — it exists only to help identify voices in the UI and the voice list file. To see every available voice, open docs/voice-list.txt in the repository. The file lists all Edge TTS and Azure voices with their locale and gender.

Previewing voices in the Web UI

In the Web UI, select a voice from the dropdown and click the Preview button. A short sample is synthesized and played back immediately so you can compare voices before generating the full video.

Parameter reference

voice_name
string
default:"zh-CN-XiaoxiaoNeural-Female"
The voice to use for narration. See the tables above and docs/voice-list.txt for the full list of options. The value must exactly match the names shown in the voice list, including the gender suffix.
voice_volume
float
default:"1.0"
Volume multiplier for the narration track, from 0.0 (silent) to 1.0 (full volume). Values above 1.0 may clip on some audio systems.This controls the narration layer only. Background music volume is set separately with bgm_volume.
voice_rate
float
default:"1.0"
Speech speed multiplier. 1.0 is normal speed. 1.2 is 20% faster. 0.8 is 20% slower.Internally this is converted to a percentage string (e.g., +20%, -20%) compatible with Edge TTS and SiliconFlow APIs. For Gemini TTS, rate adjustment is noted but currently not applied.

Custom audio

To bypass TTS entirely, supply a pre-recorded MP3 file with custom_audio_file:
{
  "video_subject": "My product demo",
  "custom_audio_file": "/absolute/path/to/narration.mp3"
}
When custom_audio_file is set and the file exists, the TTS step is skipped. The pipeline reads the file duration directly and uses it to size the video. Subtitles are disabled in this mode because word-boundary timing data is not available from a pre-recorded file.
If custom_audio_file is set but the path does not exist on disk, MoneyPrinterTurbo falls back to TTS automatically and logs a warning.

Language matching

Make sure the voice locale matches the language of your script. A Chinese voice (zh-CN-*) will not produce natural output with an English script, and vice versa. For multilingual content, use an Azure V2 multilingual voice such as en-US-AndrewMultilingualNeural-V2-Male or zh-CN-XiaoxiaoMultilingualNeural-V2-Female.

Build docs developers (and LLMs) love