When a user sends a voice note to the bot, the bot processes the audio through Gemini and responds with both a text reply and a spoken OGG voice note generated via TTS. Dummy Gemini Bot supports two TTS engines — Microsoft Edge TTS and Google Gemini TTS — with automatic failover between them, so audio delivery is as reliable as possible even when API quotas are exhausted.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/AmiraliNotFound/dummy-gemini-bot/llms.txt
Use this file to discover all available pages before exploring further.
How to Trigger TTS
TTS is triggered automatically whenever a user sends a voice message to the bot:- A user sends a voice note to the bot (in a DM, or in a group where the bot would normally respond).
- The bot downloads and processes the audio through Gemini’s audio understanding capabilities.
- Gemini generates a text response, which is then fed into the TTS pipeline.
- A voice note OGG file is sent back in the same chat as the bot’s reply.
Edge TTS Engine
The Edge TTS engine uses Microsoft’sedge-tts library, which streams audio from the same infrastructure that powers Microsoft Edge’s read-aloud feature.
| Setting | Value |
|---|---|
| Default voice | fa-IR-FaridNeural (Persian male) |
| Alternative voice | fa-IR-DilaraNeural (Persian female) |
| Pitch control | TTS_VOICE_PITCH (e.g. 0.85 for a deeper voice) |
| API cost | Free — no quota consumed |
Gemini TTS Engine
The Gemini TTS engine uses Google’s generative audio models to produce more expressive, prompt-steerable speech. Available voices:Kore, Puck, Fenrir, Aoede, Charon
Multiple models can be configured in TTS_GEMINI_MODEL as an ordered list. The bot tries each model in sequence — if the first model fails or hits a quota limit, it automatically moves to the next.
Audio Format
Both engines ultimately produce OGG/Opus files, which are natively compatible with Telegram voice notes. Telegram displays them with a waveform and playback controls. Edge TTS generates an intermediate MP3 file viaedge-tts, which is then converted to OGG/Opus by ffmpeg before being sent as a voice note.
Gemini’s raw audio output is PCM (audio/L16, 24 kHz, mono). The bot auto-detects this format and converts it to OGG/Opus via ffmpeg before sending. No manual conversion is needed.
Failover Chain
The TTS pipeline follows this ordered failover chain:TTS_FALLBACK_TO_EDGE is False, the chain stops before the Edge TTS step and no voice note is sent if all Gemini models fail.