Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/AmiraliNotFound/dummy-gemini-bot/llms.txt

Use this file to discover all available pages before exploring further.

When a user sends a voice note to the bot, the bot processes the audio through Gemini and responds with both a text reply and a spoken OGG voice note generated via TTS. Dummy Gemini Bot supports two TTS engines — Microsoft Edge TTS and Google Gemini TTS — with automatic failover between them, so audio delivery is as reliable as possible even when API quotas are exhausted.

How to Trigger TTS

TTS is triggered automatically whenever a user sends a voice message to the bot:
  1. A user sends a voice note to the bot (in a DM, or in a group where the bot would normally respond).
  2. The bot downloads and processes the audio through Gemini’s audio understanding capabilities.
  3. Gemini generates a text response, which is then fed into the TTS pipeline.
  4. A voice note OGG file is sent back in the same chat as the bot’s reply.
No special command is required — sending a voice note is sufficient to trigger the full voice-in, voice-out pipeline.

Edge TTS Engine

The Edge TTS engine uses Microsoft’s edge-tts library, which streams audio from the same infrastructure that powers Microsoft Edge’s read-aloud feature.
SettingValue
Default voicefa-IR-FaridNeural (Persian male)
Alternative voicefa-IR-DilaraNeural (Persian female)
Pitch controlTTS_VOICE_PITCH (e.g. 0.85 for a deeper voice)
API costFree — no quota consumed
Edge TTS is the fastest engine and carries no API quota cost, making it the preferred fallback option.

Gemini TTS Engine

The Gemini TTS engine uses Google’s generative audio models to produce more expressive, prompt-steerable speech. Available voices: Kore, Puck, Fenrir, Aoede, Charon Multiple models can be configured in TTS_GEMINI_MODEL as an ordered list. The bot tries each model in sequence — if the first model fails or hits a quota limit, it automatically moves to the next.

Audio Format

Both engines ultimately produce OGG/Opus files, which are natively compatible with Telegram voice notes. Telegram displays them with a waveform and playback controls. Edge TTS generates an intermediate MP3 file via edge-tts, which is then converted to OGG/Opus by ffmpeg before being sent as a voice note. Gemini’s raw audio output is PCM (audio/L16, 24 kHz, mono). The bot auto-detects this format and converts it to OGG/Opus via ffmpeg before sending. No manual conversion is needed.

Failover Chain

The TTS pipeline follows this ordered failover chain:
Gemini TTS model 1
  → Gemini TTS model 2
    → ... (additional configured models)
      → Edge TTS  (if TTS_FALLBACK_TO_EDGE=True)
        → No voice note sent (all engines failed)
If TTS_FALLBACK_TO_EDGE is False, the chain stops before the Edge TTS step and no voice note is sent if all Gemini models fail.
Edge TTS is faster and has no quota cost. Use Gemini TTS when you need more expressive, emotionally nuanced audio or when the content benefits from prompt-steerable delivery.

Build docs developers (and LLMs) love