Voice & subtitle settings

MoneyPrinterTurbo generates a voiceover for every video and optionally produces a matching subtitle file. Voice and subtitle behaviour are controlled by settings in the [app], [whisper], [azure], and [siliconflow] sections of config.toml.

Voice settings

The voice engine is selected by the voice_name value you pass when generating a video (via the Web UI or API). The name prefix determines which TTS backend is used:

Voice name prefix	Backend	Config required
No prefix (e.g. `en-US-JennyNeural-Female`)	Edge TTS	None
No prefix, name ends in `-V2-*` (e.g. `en-US-AvaMultilingualNeural-V2-Female`)	Azure Cognitive Services	`[azure]` section
`siliconflow:` (e.g. `siliconflow:FunAudioLLM/CosyVoice2-0.5B:alex-Male`)	SiliconFlow	`[siliconflow]` section
`gemini:` (e.g. `gemini:Zephyr-Female`)	Google Gemini TTS	`gemini_api_key` in `[app]`

`voice_name`

The exact voice identifier string. You select this in the Web UI from a dropdown that includes a real-time preview. When calling the API directly, pass the voice name as a parameter in your request body.

`voice_volume`

Controls the output audio volume. Accepts a float between 0.0 (silent) and 1.0 (full volume). Default is 1.0.

`voice_rate`

A speech rate multiplier. 1.0 is normal speed. Values above 1.0 speed up delivery; values below slow it down.

Value	Effect
`0.75`	25% slower
`1.0`	Normal (default)
`1.25`	25% faster

Edge TTS voices

Edge TTS is the default voice engine. It requires no API key and supports a wide range of languages and locales (400+ voices).

# No additional config needed for Edge TTS.
# voice_name is set per-video, not globally.

Example voice names:

en-US-JennyNeural-Female
en-US-GuyNeural-Male
zh-CN-XiaoxiaoNeural-Female
de-DE-KatjaNeural-Female

Use the Web UI’s voice dropdown to browse all available voices and hear a preview before generating a video.

Azure TTS voices

Azure Neural voices (including multilingual V2 variants) require an Azure Cognitive Services Speech resource. Get your key at portal.azure.com.

[azure]
speech_key = "your-azure-speech-key"
speech_region = "eastus"   # e.g. "eastus", "westeurope"

Once configured, Azure V2 voices (names ending in -V2-Female or -V2-Male) become available in the voice dropdown. Example Azure V2 voice names:

en-US-AvaMultilingualNeural-V2-Female
en-US-AndrewMultilingualNeural-V2-Male
zh-CN-XiaoxiaoMultilingualNeural-V2-Female

Standard Azure voices (without -V2) use Edge TTS internally and do not require Azure credentials. Only the -V2 multilingual voices require the [azure] section to be configured.

SiliconFlow TTS voices

SiliconFlow provides high-quality Chinese and multilingual voices via the CosyVoice2 model. Get your API key at siliconflow.cn.

[siliconflow]
api_key = "your-siliconflow-api-key"

Available SiliconFlow voice names:

siliconflow:FunAudioLLM/CosyVoice2-0.5B:alex-Male
siliconflow:FunAudioLLM/CosyVoice2-0.5B:anna-Female
siliconflow:FunAudioLLM/CosyVoice2-0.5B:bella-Female
siliconflow:FunAudioLLM/CosyVoice2-0.5B:benjamin-Male
siliconflow:FunAudioLLM/CosyVoice2-0.5B:charles-Male
siliconflow:FunAudioLLM/CosyVoice2-0.5B:claire-Female
siliconflow:FunAudioLLM/CosyVoice2-0.5B:david-Male
siliconflow:FunAudioLLM/CosyVoice2-0.5B:diana-Female

Gemini TTS voices

Gemini TTS uses the gemini-2.5-flash-preview-tts model and shares the gemini_api_key from the [app] section.

gemini_api_key = "AIza..."   # shared with Gemini LLM

Available Gemini voice names:

gemini:Zephyr-Female
gemini:Puck-Male
gemini:Aoede-Female
gemini:Orion-Male

Gemini TTS requires pydub to be installed. Run pip install pydub if you see an import error when using a gemini: voice.

Subtitle settings

Set subtitle_provider in config.toml to control how (or whether) subtitles are generated.

Edge (recommended)
Whisper
Disabled

subtitle_provider = “edge”

Subtitle timing is derived directly from Edge TTS word-boundary events during audio synthesis. This is the fastest option and requires no additional downloads.

subtitle_provider = "edge"

Characteristics:

Fast — no extra processing step
Works with any Edge TTS or Azure TTS voice
Timing accuracy is good for most use cases
No large model download required

Start with "edge" unless you need highly precise subtitle alignment. It handles the majority of use cases well.

subtitle_provider = “whisper”

Faster Whisper transcribes the generated audio to produce subtitle timing. This is more accurate than Edge-derived timing because it aligns text to the actual audio signal.

subtitle_provider = "whisper"

Configure the Whisper model in the [whisper] section:

[whisper]
# Recommended model for best accuracy
model_size = "large-v3"

# CPU inference (default)
device = "CPU"
compute_type = "int8"

# GPU inference (requires CUDA)
# device = "cuda"
# compute_type = "float16"

# GPU with reduced memory (INT8)
# device = "cuda"
# compute_type = "int8_float16"

`model_size`	Accuracy	Download size
`large-v3`	Highest (recommended)	~3 GB
`medium`	Good	~1.5 GB
`small`	Acceptable	~480 MB
`base`	Lower	~145 MB

The large-v3 model (~3 GB) is downloaded from HuggingFace on first use. Make sure you have sufficient disk space and a stable internet connection before running your first Whisper-based generation.

When to use Whisper:

You need precise subtitle timing for fast speech
You are using a SiliconFlow or Gemini TTS voice (which do not provide word-boundary events)
You notice subtitle sync issues with "edge"

subtitle_provider = "" (disabled)

Set subtitle_provider to an empty string to skip subtitle generation entirely. The final video will have no subtitle track.

subtitle_provider = ""

Use this when:

You are generating background footage without narration text
You want to add subtitles manually in post-production
You are testing or prototyping and do not need captions

Subtitle appearance

Font, size, colour, and position are configured per video at generation time, not globally in config.toml. You can set these values:

Web UI: Use the subtitle style controls in the video generation form.
API: Pass the relevant parameters in your /api/v1/videos request body.

This allows each video to have different subtitle styling without changing global configuration.

Get Started

Configuration

Features

Troubleshooting

Voice settings

`voice_name`

`voice_volume`

`voice_rate`

Edge TTS voices

Azure TTS voices

SiliconFlow TTS voices

Gemini TTS voices

Subtitle settings

subtitle_provider = “edge”

subtitle_provider = “whisper”

subtitle_provider = "" (disabled)

Subtitle appearance

Build docs developers (and LLMs) love

Get Started

Configuration

Features

Troubleshooting

Documentation Index

​Voice settings

​voice_name

​voice_volume

​voice_rate

​Edge TTS voices

​Azure TTS voices

​SiliconFlow TTS voices

​Gemini TTS voices

​Subtitle settings

​subtitle_provider = “edge”

​subtitle_provider = “whisper”

​subtitle_provider = "" (disabled)

​Subtitle appearance

Build docs developers (and LLMs) love

Voice settings

`voice_name`

`voice_volume`

`voice_rate`

Edge TTS voices

Azure TTS voices

SiliconFlow TTS voices

Gemini TTS voices

Subtitle settings

subtitle_provider = “edge”

subtitle_provider = “whisper”

subtitle_provider = "" (disabled)

Subtitle appearance