TTML (Timed Text Markup Language) is the lyrics format used by Apple Music. The Apple Syllable variant wraps each word in aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/6xingyv/accompanist-lyrics-core/llms.txt
Use this file to discover all available pages before exploring further.
<span> element with begin and end attributes, enabling precise word-by-word karaoke highlighting. TTMLParser also handles transliterations (phonetic romanization), translations, background vocal tracks, and multi-agent voice assignment—all features of the Apple Music TTML specification.
Detection
TTMLParser identifies TTML content by checking for the W3C TTML namespace string:
Constructor
Unlike the singleton parsers,TTMLParser is a class because it accepts a constructor parameter:
An optional
PhoneticProvider applied to lines that contain no existing phonetic annotation (neither a line-level x-roman span nor syllable-level transliterations from iTunes metadata). If a line already has phonetics embedded, the fallback provider is skipped for that line.TTML format overview
Agents and alignment
The<metadata> block declares named agents (ttm:agent). The parser maps the first declared agent to KaraokeAlignment.Start and every subsequent agent to KaraokeAlignment.End. Each <p> element references an agent via ttm:agent, and the corresponding alignment is applied to the resulting KaraokeLine:
| Agent position | KaraokeAlignment |
|---|---|
First agent (v1) | KaraokeAlignment.Start |
Second agent (v2) | KaraokeAlignment.End |
Special <span> roles
ttm:role value | Meaning |
|---|---|
x-translation | Inline translation text for the parent <p> |
x-bg | Background or accompaniment vocal with its own syllable <span> children |
x-roman | Line-level phonetic / romanization string |
iTunes transliterations
Apple Music TTML files may embed a<transliterations> block inside <iTunesMetadata>. Each <text for="Lx"> element contains <span> children—one per syllable—holding phonetic readings. The parser aligns these phonetics to the matching syllables by itunes:key:
iTunes translations
A<translation> block inside <head> provides out-of-band translations keyed by itunes:key:
x-translation spans, with inline taking precedence.
Usage
Working with the result
TTMLParser produces KaraokeLine.MainKaraokeLine for every <p> element that contains timed <span> children, and SyncedLine for <p> elements with no timed spans (plain line-level timing only):
Phonetic provider behaviour
ThefallbackPhoneticProvider is only invoked when a KaraokeLine has no existing phonetic data. The check considers both line-level phonetic (x-roman span) and syllable-level phonetics (iTunes transliterations):
- If
line.phoneticis non-null, the fallback is skipped for the whole line. - If any
syllable.phoneticis non-null on that line, the fallback is also skipped.
PhoneticProvider.phoneticLevel controls the granularity:
PhoneticLevel | Behaviour |
|---|---|
LINE | getPhonetic is called once with the full line text; result stored on KaraokeLine.phonetic |
SYLLABLE | getPhonetic is called once per syllable; results stored on KaraokeSyllable.phonetic |
Lines without syllable spans
If a<p> element contains no timed <span> children, TTMLParser extracts its raw text and returns a SyncedLine. An inline x-translation span is still parsed and stored in SyncedLine.translation:
TTMLParser is a class, not an object. You must call TTMLParser() (or TTMLParser(provider)) to get an instance. This is in contrast to EnhancedLrcParser, LyricifySyllableParser, and KugouKrcParser, which are singletons.