Meta Omnilingual ASR is a multilingual speech recognition engine that uses Meta’sDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
ASRInferencePipeline. It supports a wide range of languages through both CTC and LLM-family model cards. The adapter is lazy-loaded, so normal RealtimeSTT imports and installs do not require the Omnilingual runtime until this engine is selected.
Install
Use a Linux or WSL2 environment with Python 3.11.x, then run:omnilingual-asr and meta-omnilingual-asr are equivalent if you prefer the explicit package-family name.
When working from a source checkout:
The extra constrains
torch==2.8.0 and torchaudio==2.8.0 on Linux/WSL2. If you install a CUDA-enabled PyTorch stack separately, keep torch and torchaudio on matching releases. A mismatched pair can pass pip check but fail at import time with a missing libcudart.so shared-library error.Engine Names
All of the following names are accepted as thetranscription_engine value:
omnilingual_asromnilingualmeta_omnilingual_asromni_asr
omnilingual-asr, meta-omnilingual-asr, and omni-asr are also accepted through the generic engine-name normalization.
Basic Usage
model is left at RealtimeSTT’s default Whisper value, the adapter automatically selects omniASR_CTC_1B_v2.
Default Model
The recommended default model isomniASR_CTC_1B_v2. It requires more VRAM and startup time than the 300M card but is the validated default for this integration. The RealtimeSTT extra requires omnilingual-asr>=0.2.0 because omnilingual-asr==0.1.0 only included older non-v2 cards.
Available v2 Model Cards
| Model Card | Type |
|---|---|
omniASR_CTC_300M_v2 | CTC |
omniASR_CTC_1B_v2 | CTC (recommended default) |
omniASR_CTC_3B_v2 | CTC |
omniASR_CTC_7B_v2 | CTC |
omniASR_LLM_300M_v2 | LLM |
omniASR_LLM_1B_v2 | LLM |
omniASR_LLM_3B_v2 | LLM |
omniASR_LLM_7B_v2 | LLM |
omniASR_LLM_7B_ZS_v2 | LLM |
omniASR_LLM_Unlimited_300M_v2 | LLM Unlimited |
omniASR_LLM_Unlimited_1B_v2 | LLM Unlimited |
omniASR_LLM_Unlimited_3B_v2 | LLM Unlimited |
omniASR_LLM_Unlimited_7B_v2 | LLM Unlimited |
Do not silently fall back to older non-v2 cards when a v2 card is unknown. A
ModelNotKnownError for a v2 card means the installed omnilingual-asr package is older than expected — upgrade to omnilingual-asr>=0.2.0.Language Support
CTC model cards ignore the language parameter entirely — the adapter removeslang before calling the CTC pipeline.
LLM model cards accept Omnilingual language IDs such as eng_Latn:
| Short code | Omnilingual ID |
|---|---|
ar | arb_Arab |
de | deu_Latn |
en | eng_Latn |
es | spa_Latn |
fr | fra_Latn |
hi | hin_Deva |
it | ita_Latn |
ja | jpn_Jpan |
ko | kor_Hang |
nl | nld_Latn |
pl | pol_Latn |
pt | por_Latn |
ru | rus_Cyrl |
tr | tur_Latn |
uk | ukr_Cyrl |
zh | zho_Hans |
transcription_engine_options["language_aliases"].
Realtime Transcription
For realtime previews, use one shared model lane with the validated CTC model:use_main_model_for_realtime=True keeps one shared model in memory, which is the safest first test for VRAM. If you use separate final and realtime models, validate VRAM headroom before increasing model size.
Configuration Options
| Option | Meaning |
|---|---|
model | Omnilingual model card. |
transcription_engine_options["model_card"] | Overrides model. |
device | Passed to ASRInferencePipeline; cuda becomes cuda:<gpu_device_index>. |
compute_type | Mapped to torch dtype. Defaults to FP16 on CUDA and FP32 on CPU. |
transcription_engine_options["dtype"] / ["torch_dtype"] | Explicit torch dtype string: float16, bfloat16, or float32. |
transcription_engine_options["sample_rate"] | Sample rate for in-memory audio. Defaults to 16000. |
batch_size | Used when greater than 0; otherwise the adapter defaults to batch size 1. |
transcription_engine_options["batch_size"] | Overrides the RealtimeSTT batch_size value. |
transcription_engine_options["pipeline"] | Extra keyword arguments for ASRInferencePipeline. |
transcription_engine_options["transcribe"] | Extra keyword arguments for pipeline.transcribe(...). |
transcription_engine_options["max_audio_seconds"] | In-memory audio duration guard. Defaults to 39.9 seconds. Set to false to disable. |
transcription_engine_options["language"] / ["lang"] | Language used when no language parameter is set. |
transcription_engine_options["language_aliases"] | Extra short-code to Omnilingual language ID mappings. |
File Smoke Test
To test the engine without a source checkout, download the standalone smoke script from the release branch and run it from Linux or WSL2 with Python 3.11.x:in being. From a source checkout you can run the script directly:
Omnilingual model assets are large. The 1B-family smoke test can download several GiB of model and cache files on first run.
Model Cache Behavior
The Omnilingual package downloads and caches model assets through its underlying fairseq2/Hugging Face tooling in the Linux user’s normal cache locations. RealtimeSTT does not move or delete those files. The adapter passes in-memory audio as a predecoded object to avoid the Omnilingual package treating a raw NumPy float array as encoded audio bytes:FastAPI Recipe
Run the FastAPI server from a source checkout in WSL2/Linux when using Omnilingual:http://localhost:8010/ from a Windows browser. WSL2 forwards localhost for the default setup. For a browser on another device, connect to the Windows host’s LAN address and ensure the firewall and WSL networking allow the selected port.
Troubleshooting
Missing dependency errors on import
Missing dependency errors on import
Missing dependency errors mean
omnilingual_asr, PyTorch, fairseq2, or fairseq2n is not importable in the active Linux/WSL environment. Reinstall with pip install "RealtimeSTT[omnilingual]" and confirm you are in a Python 3.11.x environment.Native Windows install fails
Native Windows install fails
Native Windows installs intentionally skip the Omnilingual runtime because
fairseq2n currently has no Windows wheel. Run the Omnilingual runtime inside WSL2 or on a Linux machine.Python 3.12 dependency resolution fails
Python 3.12 dependency resolution fails
omnilingual-asr==0.2.0 declares Requires-Python: <=3.12,>=3.10, which makes normal Python 3.12 patch releases fail pip’s dependency resolver. Use Python 3.11.x until upstream package metadata changes.CUDA shared-library errors (libcudart.so)
CUDA shared-library errors (libcudart.so)
ModelNotKnownError for a _v2 model card
ModelNotKnownError for a _v2 model card
This means the installed
omnilingual-asr package does not ship that card. Upgrade or pin the package to a release that includes the documented v2 card: pip install "omnilingual-asr>=0.2.0".CUDA memory exhausted
CUDA memory exhausted
Enable one shared model with
use_main_model_for_realtime=True, reduce concurrent server sessions, or evaluate a smaller validated model such as omniASR_CTC_300M_v2 before changing the default.LLM output is empty or poor quality
LLM output is empty or poor quality
Set an Omnilingual language code such as
eng_Latn. CTC models ignore language, but LLM models require it. If left unset, LLM output quality can be poor or empty.Audio exceeds the 40-second limit
Audio exceeds the 40-second limit
The tested non-streaming Omnilingual pipeline requires audio shorter than 40 seconds. Keep realtime and final utterances below that limit. The
max_audio_seconds option (default 39.9) enforces this guard and raises an error before sending oversized audio to the pipeline.