Advanced usage

Speed control

Adjust the speed parameter to make speech faster or slower. The default is 1.0.

# Slower speech
audio_slow = model.generate("Hello!", voice="Luna", speed=0.8)

# Faster speech
audio_fast = model.generate("Hello!", voice="Luna", speed=1.5)

Values between 0.5 and 2.0 produce natural-sounding results. Going below 0.5 or above 2.0 may cause audio artifacts.

Custom cache directory

By default, models are cached in the Hugging Face cache directory (~/.cache/huggingface). To store models in a specific location, pass cache_dir:

model = KittenTTS("KittenML/kitten-tts-nano-0.8", cache_dir="./models")

This is useful for containerized deployments or when you need predictable model paths.

Text preprocessing

Enable clean_text=True to automatically expand numbers, currencies, abbreviations, and other non-standard tokens before synthesis:

audio = model.generate(
    "The price is $99.99, a 20% discount from $124.99.",
    voice="Bella",
    clean_text=True
)
# Preprocessed: "The price is ninety-nine dollars and ninety-nine cents,
#                a twenty percent discount from one hundred twenty-four dollars
#                and ninety-nine cents."

Text preprocessing is disabled by default (clean_text=False) for performance. Enable it when your input contains numbers, currencies, dates, or other non-alphabetic tokens that should be spoken aloud.

Long text handling

KittenTTS automatically chunks long inputs at sentence boundaries (approximately every 400 characters). You do not need to split text manually — pass the full string and the library handles segmentation internally.

long_text = """
Once upon a time, in a land far away, there lived a small kitten who dreamed
of speaking to the world. Every day the kitten practiced, whispering words
into the wind, hoping someone would hear. One morning, a traveler passed by
and paused to listen. "I can hear you," said the traveler. From that day on,
the kitten was never silent again.
"""

audio = model.generate(long_text, voice="Rosie")

Batch processing

To generate audio for multiple texts, loop over your inputs. Each call to generate is independent:

from kittentts import KittenTTS
import soundfile as sf

model = KittenTTS("KittenML/kitten-tts-nano-0.8")

texts = [
    "Hello, welcome to our service.",
    "Your order has been confirmed.",
    "Thank you for using Kitten TTS.",
]

for i, text in enumerate(texts):
    audio = model.generate(text, voice="Jasper")
    sf.write(f"output_{i}.wav", audio, 24000)

Model loading is the most expensive step. Create the KittenTTS instance once and reuse it across all calls rather than re-initializing inside the loop.

Get Started

Concepts

Guides

Models

Speed control

Custom cache directory

Text preprocessing

Long text handling

Batch processing

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Models

Documentation Index

​Speed control

​Custom cache directory

​Text preprocessing

​Long text handling

​Batch processing

Build docs developers (and LLMs) love

Speed control

Custom cache directory

Text preprocessing

Long text handling

Batch processing