Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KittenML/KittenTTS/llms.txt

Use this file to discover all available pages before exploring further.

Speed control

Adjust the speed parameter to make speech faster or slower. The default is 1.0.
# Slower speech
audio_slow = model.generate("Hello!", voice="Luna", speed=0.8)

# Faster speech
audio_fast = model.generate("Hello!", voice="Luna", speed=1.5)
Values between 0.5 and 2.0 produce natural-sounding results. Going below 0.5 or above 2.0 may cause audio artifacts.

Custom cache directory

By default, models are cached in the Hugging Face cache directory (~/.cache/huggingface). To store models in a specific location, pass cache_dir:
model = KittenTTS("KittenML/kitten-tts-nano-0.8", cache_dir="./models")
This is useful for containerized deployments or when you need predictable model paths.

Text preprocessing

Enable clean_text=True to automatically expand numbers, currencies, abbreviations, and other non-standard tokens before synthesis:
audio = model.generate(
    "The price is $99.99, a 20% discount from $124.99.",
    voice="Bella",
    clean_text=True
)
# Preprocessed: "The price is ninety-nine dollars and ninety-nine cents,
#                a twenty percent discount from one hundred twenty-four dollars
#                and ninety-nine cents."
Text preprocessing is disabled by default (clean_text=False) for performance. Enable it when your input contains numbers, currencies, dates, or other non-alphabetic tokens that should be spoken aloud.

Long text handling

KittenTTS automatically chunks long inputs at sentence boundaries (approximately every 400 characters). You do not need to split text manually — pass the full string and the library handles segmentation internally.
long_text = """
Once upon a time, in a land far away, there lived a small kitten who dreamed
of speaking to the world. Every day the kitten practiced, whispering words
into the wind, hoping someone would hear. One morning, a traveler passed by
and paused to listen. "I can hear you," said the traveler. From that day on,
the kitten was never silent again.
"""

audio = model.generate(long_text, voice="Rosie")

Batch processing

To generate audio for multiple texts, loop over your inputs. Each call to generate is independent:
from kittentts import KittenTTS
import soundfile as sf

model = KittenTTS("KittenML/kitten-tts-nano-0.8")

texts = [
    "Hello, welcome to our service.",
    "Your order has been confirmed.",
    "Thank you for using Kitten TTS.",
]

for i, text in enumerate(texts):
    audio = model.generate(text, voice="Jasper")
    sf.write(f"output_{i}.wav", audio, 24000)
Model loading is the most expensive step. Create the KittenTTS instance once and reuse it across all calls rather than re-initializing inside the loop.

Build docs developers (and LLMs) love