Text preprocessing

KittenTTS includes a TextPreprocessor that converts raw text into a form that the phonemizer can render accurately. It handles numbers, symbols, abbreviations, and formatting that would otherwise produce incorrect or garbled audio.

When to use preprocessing

clean_text=True
clean_text=False

Pass raw, unformatted text — the preprocessor handles expansion automatically.

# The preprocessor expands $99.99 and 20% before phonemization
model.generate(
    "The price is $99.99, a 20% discount.",
    voice="Bella",
    clean_text=True,
)

Use this when your input comes from user-generated content, documents, or any source with mixed formatting.

Pass text that is already fully written out in spoken form.

# Text is already normalized — no preprocessing needed
model.generate(
    "The price is ninety-nine dollars and ninety-nine cents.",
    voice="Bella",
    clean_text=False,
)

Use this when you control the input and want to avoid any unintended transformations.

clean_text defaults to False in generate(). The generate_to_file() method does not expose a clean_text parameter — preprocess your text manually or use generate() with clean_text=True if you need preprocessing before saving to file.

Pipeline steps

The preprocessor applies 23 transformations in a fixed order. Each step operates on the output of the previous one.

normalize_unicode

Converts non-ASCII Unicode characters to their closest ASCII equivalents (e.g. curly quotes to straight quotes, em dashes to hyphens).

remove_html_tags

Strips any HTML markup from the text. Useful when input comes from web scraping or rich-text editors.

remove_urls

Removes HTTP/HTTPS URLs. URLs are typically not speakable in a useful way.

remove_emails

Removes email addresses from the text.

expand_contractions

Expands English contractions to their full forms.

Before	After
`don't`	`do not`
`it's`	`it is`

expand_ip_addresses

Reads out IP address octets individually.

Before	After
`192.168.1.1`	`one nine two dot one six eight dot one dot one`

normalize_leading_decimals

Adds a leading zero to bare decimal numbers.

Before	After
`.5`	`0.5`

expand_currency

Converts currency symbols and amounts to spoken words.

Before	After
`$100`	`one hundred dollars`
`€1,200.50`	`twelve hundred euros and fifty cents`

expand_percentages

Replaces percentage expressions with words.

Before	After
`50% off`	`fifty percent off`

expand_scientific_notation

Converts scientific notation to a spoken form.

Before	After
`1e-4`	`one times ten to the negative four`

expand_time

Reads time expressions in 12-hour and 24-hour formats.

Before	After
`3:30pm`	`three thirty pm`
`14:00`	`fourteen hundred`

expand_ordinals

Converts ordinal numbers to words.

Before	After
`1st place`	`first place`
`21st century`	`twenty-first century`

expand_units

Expands common measurement units attached to numbers.

Before	After
`100km`	`one hundred kilometers`
`5GB`	`five gigabytes`

expand_scale_suffixes

Expands large-number suffixes.

Before	After
`7B`	`seven billion`

expand_fractions

Converts simple fractions to words.

Before	After
`1/2`	`one half`
`3/4`	`three quarters`

expand_decades

Reads decade references in spoken form.

Before	After
`80s`	`eighties`
`1980s`	`nineteen eighties`

expand_phone_numbers

Reads phone numbers digit by digit.

Before	After
`555-1234`	`five five five one two three four`

expand_ranges

Converts numeric ranges expressed with a hyphen.

Before	After
`10-20 items`	`ten to twenty items`

expand_model_names

Separates alphanumeric model names so they are read naturally.

Before	After
`GPT-3`	`GPT 3`

replace_numbers

Converts any remaining digit sequences to their word equivalents.

Before	After
`1200 students`	`twelve hundred students`

remove_punctuation (optional)

Removes punctuation characters. This step is optional and can be disabled if punctuation is needed for phrasing.

to_lowercase

Converts the entire text to lowercase.

remove_extra_whitespace

Collapses multiple spaces and trims leading/trailing whitespace.

Expansion reference

The table below summarizes the main expansion types with before and after examples.

Category	Before	After
Numbers	`1200 students`	`twelve hundred students`
Currency	`$100`	`one hundred dollars`
Currency (complex)	`€1,200.50`	`twelve hundred euros and fifty cents`
Percentages	`50% off`	`fifty percent off`
Ordinals	`1st place`	`first place`
Ordinals (complex)	`21st century`	`twenty-first century`
Time (12h)	`3:30pm`	`three thirty pm`
Time (24h)	`14:00`	`fourteen hundred`
Units	`100km`	`one hundred kilometers`
Units (digital)	`5GB`	`five gigabytes`
Scientific	`1e-4`	`one times ten to the negative four`
Fractions	`1/2`	`one half`
Fractions	`3/4`	`three quarters`
Decades	`80s`	`eighties`
Decades (full)	`1980s`	`nineteen eighties`
Model names	`GPT-3`	`GPT 3`
Contractions	`don't`	`do not`
Ranges	`10-20 items`	`ten to twenty items`
Phone numbers	`555-1234`	`five five five one two three four`

Preprocessing is applied before chunking and phonemization. Text that has already been expanded does not need clean_text=True — running it on already-normalized text is safe but redundant.

Get Started

Concepts

Guides

Models

When to use preprocessing

Pipeline steps

Expansion reference

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Models

Documentation Index

​When to use preprocessing

​Pipeline steps

​Expansion reference

Build docs developers (and LLMs) love

When to use preprocessing

Pipeline steps

Expansion reference