Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KittenML/KittenTTS/llms.txt

Use this file to discover all available pages before exploring further.

KittenTTS includes a TextPreprocessor that converts raw text into a form that the phonemizer can render accurately. It handles numbers, symbols, abbreviations, and formatting that would otherwise produce incorrect or garbled audio.

When to use preprocessing

Pass raw, unformatted text — the preprocessor handles expansion automatically.
# The preprocessor expands $99.99 and 20% before phonemization
model.generate(
    "The price is $99.99, a 20% discount.",
    voice="Bella",
    clean_text=True,
)
Use this when your input comes from user-generated content, documents, or any source with mixed formatting.
clean_text defaults to False in generate(). The generate_to_file() method does not expose a clean_text parameter — preprocess your text manually or use generate() with clean_text=True if you need preprocessing before saving to file.

Pipeline steps

The preprocessor applies 23 transformations in a fixed order. Each step operates on the output of the previous one.
1

normalize_unicode

Converts non-ASCII Unicode characters to their closest ASCII equivalents (e.g. curly quotes to straight quotes, em dashes to hyphens).
2

remove_html_tags

Strips any HTML markup from the text. Useful when input comes from web scraping or rich-text editors.
3

remove_urls

Removes HTTP/HTTPS URLs. URLs are typically not speakable in a useful way.
4

remove_emails

Removes email addresses from the text.
5

expand_contractions

Expands English contractions to their full forms.
BeforeAfter
don'tdo not
it'sit is
6

expand_ip_addresses

Reads out IP address octets individually.
BeforeAfter
192.168.1.1one nine two dot one six eight dot one dot one
7

normalize_leading_decimals

Adds a leading zero to bare decimal numbers.
BeforeAfter
.50.5
8

expand_currency

Converts currency symbols and amounts to spoken words.
BeforeAfter
$100one hundred dollars
€1,200.50twelve hundred euros and fifty cents
9

expand_percentages

Replaces percentage expressions with words.
BeforeAfter
50% offfifty percent off
10

expand_scientific_notation

Converts scientific notation to a spoken form.
BeforeAfter
1e-4one times ten to the negative four
11

expand_time

Reads time expressions in 12-hour and 24-hour formats.
BeforeAfter
3:30pmthree thirty pm
14:00fourteen hundred
12

expand_ordinals

Converts ordinal numbers to words.
BeforeAfter
1st placefirst place
21st centurytwenty-first century
13

expand_units

Expands common measurement units attached to numbers.
BeforeAfter
100kmone hundred kilometers
5GBfive gigabytes
14

expand_scale_suffixes

Expands large-number suffixes.
BeforeAfter
7Bseven billion
15

expand_fractions

Converts simple fractions to words.
BeforeAfter
1/2one half
3/4three quarters
16

expand_decades

Reads decade references in spoken form.
BeforeAfter
80seighties
1980snineteen eighties
17

expand_phone_numbers

Reads phone numbers digit by digit.
BeforeAfter
555-1234five five five one two three four
18

expand_ranges

Converts numeric ranges expressed with a hyphen.
BeforeAfter
10-20 itemsten to twenty items
19

expand_model_names

Separates alphanumeric model names so they are read naturally.
BeforeAfter
GPT-3GPT 3
20

replace_numbers

Converts any remaining digit sequences to their word equivalents.
BeforeAfter
1200 studentstwelve hundred students
21

remove_punctuation (optional)

Removes punctuation characters. This step is optional and can be disabled if punctuation is needed for phrasing.
22

to_lowercase

Converts the entire text to lowercase.
23

remove_extra_whitespace

Collapses multiple spaces and trims leading/trailing whitespace.

Expansion reference

The table below summarizes the main expansion types with before and after examples.
CategoryBeforeAfter
Numbers1200 studentstwelve hundred students
Currency$100one hundred dollars
Currency (complex)€1,200.50twelve hundred euros and fifty cents
Percentages50% offfifty percent off
Ordinals1st placefirst place
Ordinals (complex)21st centurytwenty-first century
Time (12h)3:30pmthree thirty pm
Time (24h)14:00fourteen hundred
Units100kmone hundred kilometers
Units (digital)5GBfive gigabytes
Scientific1e-4one times ten to the negative four
Fractions1/2one half
Fractions3/4three quarters
Decades80seighties
Decades (full)1980snineteen eighties
Model namesGPT-3GPT 3
Contractionsdon'tdo not
Ranges10-20 itemsten to twenty items
Phone numbers555-1234five five five one two three four
Preprocessing is applied before chunking and phonemization. Text that has already been expanded does not need clean_text=True — running it on already-normalized text is safe but redundant.

Build docs developers (and LLMs) love