Meta Omnilingual ASR Engine for RealtimeSTT

Meta Omnilingual ASR is a multilingual speech recognition engine that uses Meta’s ASRInferencePipeline. It supports a wide range of languages through both CTC and LLM-family model cards. The adapter is lazy-loaded, so normal RealtimeSTT imports and installs do not require the Omnilingual runtime until this engine is selected.

Platform requirements are strict. The Omnilingual engine only runs on Linux or WSL2 with Python 3.11.x. Native Windows cannot run the Omnilingual runtime because fairseq2n has no Windows wheel. Python 3.12.x currently fails dependency resolution because upstream omnilingual-asr==0.2.0 declares Requires-Python: <=3.12,>=3.10, which excludes normal 3.12 patch releases.

Install

Use a Linux or WSL2 environment with Python 3.11.x, then run:

pip install "RealtimeSTT[omnilingual]"

The aliases omnilingual-asr and meta-omnilingual-asr are equivalent if you prefer the explicit package-family name. When working from a source checkout:

pip install -e ".[omnilingual]"

The extra constrains torch==2.8.0 and torchaudio==2.8.0 on Linux/WSL2. If you install a CUDA-enabled PyTorch stack separately, keep torch and torchaudio on matching releases. A mismatched pair can pass pip check but fail at import time with a missing libcudart.so shared-library error.

Engine Names

All of the following names are accepted as the transcription_engine value:

omnilingual_asr
omnilingual
meta_omnilingual_asr
omni_asr

Hyphenated CLI forms such as omnilingual-asr, meta-omnilingual-asr, and omni-asr are also accepted through the generic engine-name normalization.

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="omnilingual_asr",
    model="omniASR_CTC_1B_v2",
    device="cuda",
    compute_type="float16",
)

If model is left at RealtimeSTT’s default Whisper value, the adapter automatically selects omniASR_CTC_1B_v2.

Default Model

The recommended default model is omniASR_CTC_1B_v2. It requires more VRAM and startup time than the 300M card but is the validated default for this integration. The RealtimeSTT extra requires omnilingual-asr>=0.2.0 because omnilingual-asr==0.1.0 only included older non-v2 cards.

Available v2 Model Cards

Model Card	Type
`omniASR_CTC_300M_v2`	CTC
`omniASR_CTC_1B_v2`	CTC (recommended default)
`omniASR_CTC_3B_v2`	CTC
`omniASR_CTC_7B_v2`	CTC
`omniASR_LLM_300M_v2`	LLM
`omniASR_LLM_1B_v2`	LLM
`omniASR_LLM_3B_v2`	LLM
`omniASR_LLM_7B_v2`	LLM
`omniASR_LLM_7B_ZS_v2`	LLM
`omniASR_LLM_Unlimited_300M_v2`	LLM Unlimited
`omniASR_LLM_Unlimited_1B_v2`	LLM Unlimited
`omniASR_LLM_Unlimited_3B_v2`	LLM Unlimited
`omniASR_LLM_Unlimited_7B_v2`	LLM Unlimited

Do not silently fall back to older non-v2 cards when a v2 card is unknown. A ModelNotKnownError for a v2 card means the installed omnilingual-asr package is older than expected — upgrade to omnilingual-asr>=0.2.0.

Language Support

CTC model cards ignore the language parameter entirely — the adapter removes lang before calling the CTC pipeline. LLM model cards accept Omnilingual language IDs such as eng_Latn:

recorder = AudioToTextRecorder(
    transcription_engine="omnilingual_asr",
    model="omniASR_LLM_1B_v2",
    language="eng_Latn",
    device="cuda",
    compute_type="float16",
)

Common ISO 639-1 short codes are automatically mapped to Omnilingual language IDs:

Short code	Omnilingual ID
`ar`	`arb_Arab`
`de`	`deu_Latn`
`en`	`eng_Latn`
`es`	`spa_Latn`
`fr`	`fra_Latn`
`hi`	`hin_Deva`
`it`	`ita_Latn`
`ja`	`jpn_Jpan`
`ko`	`kor_Hang`
`nl`	`nld_Latn`
`pl`	`pol_Latn`
`pt`	`por_Latn`
`ru`	`rus_Cyrl`
`tr`	`tur_Latn`
`uk`	`ukr_Cyrl`
`zh`	`zho_Hans`

Additional aliases can be supplied through transcription_engine_options["language_aliases"].

Realtime Transcription

For realtime previews, use one shared model lane with the validated CTC model:

recorder = AudioToTextRecorder(
    transcription_engine="omnilingual_asr",
    model="omniASR_CTC_1B_v2",
    enable_realtime_transcription=True,
    realtime_transcription_engine="omnilingual_asr",
    realtime_model_type="omniASR_CTC_1B_v2",
    use_main_model_for_realtime=True,
    device="cuda",
    compute_type="float16",
)

use_main_model_for_realtime=True keeps one shared model in memory, which is the safest first test for VRAM. If you use separate final and realtime models, validate VRAM headroom before increasing model size.

Configuration Options

Option	Meaning
`model`	Omnilingual model card.
`transcription_engine_options["model_card"]`	Overrides `model`.
`device`	Passed to `ASRInferencePipeline`; `cuda` becomes `cuda:<gpu_device_index>`.
`compute_type`	Mapped to torch dtype. Defaults to FP16 on CUDA and FP32 on CPU.
`transcription_engine_options["dtype"]` / `["torch_dtype"]`	Explicit torch dtype string: `float16`, `bfloat16`, or `float32`.
`transcription_engine_options["sample_rate"]`	Sample rate for in-memory audio. Defaults to `16000`.
`batch_size`	Used when greater than `0`; otherwise the adapter defaults to batch size `1`.
`transcription_engine_options["batch_size"]`	Overrides the RealtimeSTT `batch_size` value.
`transcription_engine_options["pipeline"]`	Extra keyword arguments for `ASRInferencePipeline`.
`transcription_engine_options["transcribe"]`	Extra keyword arguments for `pipeline.transcribe(...)`.
`transcription_engine_options["max_audio_seconds"]`	In-memory audio duration guard. Defaults to `39.9` seconds. Set to `false` to disable.
`transcription_engine_options["language"]` / `["lang"]`	Language used when no `language` parameter is set.
`transcription_engine_options["language_aliases"]`	Extra short-code to Omnilingual language ID mappings.

File Smoke Test

To test the engine without a source checkout, download the standalone smoke script from the release branch and run it from Linux or WSL2 with Python 3.11.x:

curl -L https://raw.githubusercontent.com/KoljaB/RealtimeSTT/release/v1.0.1/tests/realtimestt_omnilingual_test.py \
  -o realtimestt_omnilingual_test.py
python realtimestt_omnilingual_test.py --file-smoke --device cuda --cache-home .home-omnilingual-smoke

The script downloads a small public-domain LJ Speech audio fixture and expects the recognized text to contain in being. From a source checkout you can run the script directly:

python tests/realtimestt_omnilingual_test.py --file-smoke --device cuda

Omnilingual model assets are large. The 1B-family smoke test can download several GiB of model and cache files on first run.

For an interactive microphone check after the file smoke passes:

python realtimestt_omnilingual_test.py --microphone --device cuda

Model Cache Behavior

The Omnilingual package downloads and caches model assets through its underlying fairseq2/Hugging Face tooling in the Linux user’s normal cache locations. RealtimeSTT does not move or delete those files. The adapter passes in-memory audio as a predecoded object to avoid the Omnilingual package treating a raw NumPy float array as encoded audio bytes:

{"waveform": waveform_np, "sample_rate": sample_rate}

FastAPI Recipe

Run the FastAPI server from a source checkout in WSL2/Linux when using Omnilingual:

PYTHONPATH=. python example_fastapi_server/server.py \
  --host 0.0.0.0 \
  --port 8010 \
  --engine omnilingual_asr \
  --model omniASR_CTC_1B_v2 \
  --realtime-engine omnilingual_asr \
  --realtime-model omniASR_CTC_1B_v2 \
  --use-main-model-for-realtime \
  --device cuda \
  --compute-type float16 \
  --realtime-processing-pause 0.05 \
  --engine-options '{"batch_size":1,"sample_rate":16000}'

Open http://localhost:8010/ from a Windows browser. WSL2 forwards localhost for the default setup. For a browser on another device, connect to the Windows host’s LAN address and ensure the firewall and WSL networking allow the selected port.

Troubleshooting

Missing dependency errors on import

Missing dependency errors mean omnilingual_asr, PyTorch, fairseq2, or fairseq2n is not importable in the active Linux/WSL environment. Reinstall with pip install "RealtimeSTT[omnilingual]" and confirm you are in a Python 3.11.x environment.

Native Windows install fails

Native Windows installs intentionally skip the Omnilingual runtime because fairseq2n currently has no Windows wheel. Run the Omnilingual runtime inside WSL2 or on a Linux machine.

Python 3.12 dependency resolution fails

omnilingual-asr==0.2.0 declares Requires-Python: <=3.12,>=3.10, which makes normal Python 3.12 patch releases fail pip’s dependency resolver. Use Python 3.11.x until upstream package metadata changes.

CUDA shared-library errors (libcudart.so)

A missing libcudart.so error usually means torch and torchaudio were resolved from incompatible builds. Install matching versions in the same environment. The Omnilingual extra constrains torch==2.8.0 and torchaudio==2.8.0.

ModelNotKnownError for a _v2 model card

This means the installed omnilingual-asr package does not ship that card. Upgrade or pin the package to a release that includes the documented v2 card: pip install "omnilingual-asr>=0.2.0".

CUDA memory exhausted

Enable one shared model with use_main_model_for_realtime=True, reduce concurrent server sessions, or evaluate a smaller validated model such as omniASR_CTC_300M_v2 before changing the default.

LLM output is empty or poor quality

Set an Omnilingual language code such as eng_Latn. CTC models ignore language, but LLM models require it. If left unset, LLM output quality can be poor or empty.

Audio exceeds the 40-second limit

The tested non-streaming Omnilingual pipeline requires audio shorter than 40 seconds. Keep realtime and final utterances below that limit. The max_audio_seconds option (default 39.9) enforces this guard and raises an error before sending oversized audio to the pipeline.

Get Started

Guides

Transcription Engines

Resources

Meta Omnilingual ASR Engine for RealtimeSTT

Install

Engine Names

Basic Usage

Default Model

Available v2 Model Cards

Language Support

Realtime Transcription

Configuration Options

File Smoke Test

Model Cache Behavior

FastAPI Recipe

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

​Install

​Engine Names

​Basic Usage

​Default Model

​Available v2 Model Cards

​Language Support

​Realtime Transcription

​Configuration Options

​File Smoke Test

​Model Cache Behavior

​FastAPI Recipe

​Troubleshooting

Build docs developers (and LLMs) love

Install

Engine Names

Basic Usage

Default Model

Available v2 Model Cards

Language Support

Realtime Transcription

Configuration Options

File Smoke Test

Model Cache Behavior

FastAPI Recipe

Troubleshooting