Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-ASR/llms.txt

Use this file to discover all available pages before exploring further.

Qwen3-ASR-1.7B and Qwen3-ASR-0.6B recognize 30 languages and 22 Chinese dialects in a single unified model — no separate language-specific weights are required. Language identification runs automatically whenever you do not specify a language. Qwen3-ForcedAligner-0.6B supports a subset of 11 languages for timestamp alignment. This page lists every supported language and dialect, explains how to pass them to the API, and describes auto-detection behavior.

Supported Languages (30)

Both ASR models support the following 30 languages. Pass the Language Name string (first letter uppercase, rest lowercase) as the language parameter.
Language NameISO 639 Code
Chinesezh
Englishen
Cantoneseyue
Arabicar
Germande
Frenchfr
Spanishes
Portuguesept
Indonesianid
Italianit
Koreanko
Russianru
Thaith
Vietnamesevi
Japaneseja
Turkishtr
Hindihi
Malayms
Dutchnl
Swedishsv
Danishda
Finnishfi
Polishpl
Czechcs
Filipinofil
Persianfa
Greekel
Romanianro
Hungarianhu
Macedonianmk

Supported Chinese Dialects (22)

The following 22 Chinese dialects are recognized by both ASR models. When specifying a dialect, pass "Chinese" as the language parameter — the model will recognize dialectal speech automatically.
Chinese dialects are a subset of the "Chinese" language. You do not pass the dialect name directly via the language parameter; instead, set language="Chinese" (or language=None to let the model detect it automatically) and the model handles dialectal variation internally.
DialectRegion / Notes
AnhuiAnhui province
DongbeiNortheast China
FujianFujian province
GansuGansu province
GuizhouGuizhou province
HebeiHebei province
HenanHenan province
HubeiHubei province
HunanHunan province
JiangxiJiangxi province
NingxiaNingxia region
ShandongShandong province
ShaanxiShaanxi province
ShanxiShanxi province
SichuanSichuan province (Chuan dialect)
TianjinTianjin municipality
YunnanYunnan province
ZhejiangZhejiang province
Cantonese (Hong Kong accent)Hong Kong
Cantonese (Guangdong accent)Guangdong province
Wu languageShanghai / Jiangsu / Zhejiang
Minnan languageSouthern Fujian / Taiwan

Languages Supported by ForcedAligner (11)

Qwen3-ForcedAligner-0.6B supports timestamp prediction for the following 11 languages:
Language
Chinese
English
Cantonese
French
German
Italian
Japanese
Korean
Portuguese
Russian
Spanish

Specifying a Language in Code

Pass the canonical language name as the language parameter to model.transcribe() or model.align(). Language names must be formatted with the first letter uppercase and the rest lowercase (e.g., "Chinese", "English", "Cantonese").
If you pass a non-canonical casing (e.g., "cHINese" or "ENGLISH"), the normalize_language_name() function in qwen_asr normalizes it automatically before validation. The safest practice is still to use the canonical form shown in the tables above.

Forcing a specific language

import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "Qwen/Qwen3-ASR-1.7B",
    dtype=torch.bfloat16,
    device_map="cuda:0",
)

# Single audio — force English
results = model.transcribe(
    audio="path/to/english_audio.wav",
    language="English",
)
print(results[0].text)

# Batch — specify a language per item
results = model.transcribe(
    audio=[
        "path/to/chinese_audio.wav",
        "path/to/french_audio.wav",
    ],
    language=["Chinese", "French"],
)
for r in results:
    print(r.language, r.text)

Automatic language detection

Set language=None (the default) to let the model identify the language from the audio itself. The detected language is returned as the .language attribute of each result.
import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "Qwen/Qwen3-ASR-1.7B",
    dtype=torch.bfloat16,
    device_map="cuda:0",
)

# Auto-detect language
results = model.transcribe(
    audio="path/to/unknown_language.wav",
    language=None,  # None triggers automatic language detection
)

print(results[0].language)   # e.g., "Japanese"
print(results[0].text)

Using the ForcedAligner with a language

When calling Qwen3ForcedAligner.align() directly, the language parameter is required and must be one of the 11 supported aligner languages:
import torch
from qwen_asr import Qwen3ForcedAligner

aligner = Qwen3ForcedAligner.from_pretrained(
    "Qwen/Qwen3-ForcedAligner-0.6B",
    dtype=torch.bfloat16,
    device_map="cuda:0",
)

results = aligner.align(
    audio="path/to/audio.wav",
    text="甚至出现交易几乎停滞的情况。",
    language="Chinese",   # Must be one of the 11 aligner languages
)

for token in results[0]:
    print(f"{token.text}  {token.start_time:.3f}s – {token.end_time:.3f}s")

Language Name Format

The canonical format used throughout the qwen-asr package is:
  • First letter uppercase
  • All remaining letters lowercase
Examples: "Chinese", "English", "Cantonese", "Filipino", "Macedonian". This normalization is applied automatically by normalize_language_name() before any language is passed to the model, so minor casing variations will not cause errors. However, completely unrecognized strings (e.g., "zh" or "mandarin") will cause validate_language() to raise a ValueError because the normalized name is not present in SUPPORTED_LANGUAGES.

Build docs developers (and LLMs) love