Supported Languages and Dialects in Qwen3-ASR

Qwen3-ASR-1.7B and Qwen3-ASR-0.6B recognize 30 languages and 22 Chinese dialects in a single unified model — no separate language-specific weights are required. Language identification runs automatically whenever you do not specify a language. Qwen3-ForcedAligner-0.6B supports a subset of 11 languages for timestamp alignment. This page lists every supported language and dialect, explains how to pass them to the API, and describes auto-detection behavior.

Supported Languages (30)

Both ASR models support the following 30 languages. Pass the Language Name string (first letter uppercase, rest lowercase) as the language parameter.

Language Name	ISO 639 Code
Chinese	zh
English	en
Cantonese	yue
Arabic	ar
German	de
French	fr
Spanish	es
Portuguese	pt
Indonesian	id
Italian	it
Korean	ko
Russian	ru
Thai	th
Vietnamese	vi
Japanese	ja
Turkish	tr
Hindi	hi
Malay	ms
Dutch	nl
Swedish	sv
Danish	da
Finnish	fi
Polish	pl
Czech	cs
Filipino	fil
Persian	fa
Greek	el
Romanian	ro
Hungarian	hu
Macedonian	mk

Supported Chinese Dialects (22)

The following 22 Chinese dialects are recognized by both ASR models. When specifying a dialect, pass "Chinese" as the language parameter — the model will recognize dialectal speech automatically.

Chinese dialects are a subset of the "Chinese" language. You do not pass the dialect name directly via the language parameter; instead, set language="Chinese" (or language=None to let the model detect it automatically) and the model handles dialectal variation internally.

Dialect	Region / Notes
Anhui	Anhui province
Dongbei	Northeast China
Fujian	Fujian province
Gansu	Gansu province
Guizhou	Guizhou province
Hebei	Hebei province
Henan	Henan province
Hubei	Hubei province
Hunan	Hunan province
Jiangxi	Jiangxi province
Ningxia	Ningxia region
Shandong	Shandong province
Shaanxi	Shaanxi province
Shanxi	Shanxi province
Sichuan	Sichuan province (Chuan dialect)
Tianjin	Tianjin municipality
Yunnan	Yunnan province
Zhejiang	Zhejiang province
Cantonese (Hong Kong accent)	Hong Kong
Cantonese (Guangdong accent)	Guangdong province
Wu language	Shanghai / Jiangsu / Zhejiang
Minnan language	Southern Fujian / Taiwan

Languages Supported by ForcedAligner (11)

Qwen3-ForcedAligner-0.6B supports timestamp prediction for the following 11 languages:

Language
Chinese
English
Cantonese
French
German
Italian
Japanese
Korean
Portuguese
Russian
Spanish

Specifying a Language in Code

Pass the canonical language name as the language parameter to model.transcribe() or model.align(). Language names must be formatted with the first letter uppercase and the rest lowercase (e.g., "Chinese", "English", "Cantonese").

If you pass a non-canonical casing (e.g., "cHINese" or "ENGLISH"), the normalize_language_name() function in qwen_asr normalizes it automatically before validation. The safest practice is still to use the canonical form shown in the tables above.

Forcing a specific language

import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "Qwen/Qwen3-ASR-1.7B",
    dtype=torch.bfloat16,
    device_map="cuda:0",
)

# Single audio — force English
results = model.transcribe(
    audio="path/to/english_audio.wav",
    language="English",
)
print(results[0].text)

# Batch — specify a language per item
results = model.transcribe(
    audio=[
        "path/to/chinese_audio.wav",
        "path/to/french_audio.wav",
    ],
    language=["Chinese", "French"],
)
for r in results:
    print(r.language, r.text)

Automatic language detection

Set language=None (the default) to let the model identify the language from the audio itself. The detected language is returned as the .language attribute of each result.

import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "Qwen/Qwen3-ASR-1.7B",
    dtype=torch.bfloat16,
    device_map="cuda:0",
)

# Auto-detect language
results = model.transcribe(
    audio="path/to/unknown_language.wav",
    language=None,  # None triggers automatic language detection
)

print(results[0].language)   # e.g., "Japanese"
print(results[0].text)

Using the ForcedAligner with a language

When calling Qwen3ForcedAligner.align() directly, the language parameter is required and must be one of the 11 supported aligner languages:

import torch
from qwen_asr import Qwen3ForcedAligner

aligner = Qwen3ForcedAligner.from_pretrained(
    "Qwen/Qwen3-ForcedAligner-0.6B",
    dtype=torch.bfloat16,
    device_map="cuda:0",
)

results = aligner.align(
    audio="path/to/audio.wav",
    text="甚至出现交易几乎停滞的情况。",
    language="Chinese",   # Must be one of the 11 aligner languages
)

for token in results[0]:
    print(f"{token.text}  {token.start_time:.3f}s – {token.end_time:.3f}s")

Language Name Format

The canonical format used throughout the qwen-asr package is:

First letter uppercase
All remaining letters lowercase

Examples: "Chinese", "English", "Cantonese", "Filipino", "Macedonian". This normalization is applied automatically by normalize_language_name() before any language is passed to the model, so minor casing variations will not cause errors. However, completely unrecognized strings (e.g., "zh" or "mandarin") will cause validate_language() to raise a ValueError because the normalized name is not present in SUPPORTED_LANGUAGES.

Get Started

Inference

Deployment

Fine-Tuning

Reference

Supported Languages and Dialects in Qwen3-ASR

Supported Languages (30)

Supported Chinese Dialects (22)

Languages Supported by ForcedAligner (11)

Specifying a Language in Code

Forcing a specific language

Automatic language detection

Using the ForcedAligner with a language

Language Name Format

Build docs developers (and LLMs) love

Get Started

Inference

Deployment

Fine-Tuning

Reference

Documentation Index

​Supported Languages (30)

​Supported Chinese Dialects (22)

​Languages Supported by ForcedAligner (11)

​Specifying a Language in Code

​Forcing a specific language

​Automatic language detection

​Using the ForcedAligner with a language

​Language Name Format

Build docs developers (and LLMs) love

Supported Languages (30)

Supported Chinese Dialects (22)

Languages Supported by ForcedAligner (11)

Specifying a Language in Code

Forcing a specific language

Automatic language detection

Using the ForcedAligner with a language

Language Name Format