FunASR is Alibaba DAMO’s industrial-grade speech recognition toolkit. It supports 50+ languages, speaker diarization, emotion detection, streaming inference, and runs at up to 170× real-time speed. It is especially well-suited for Chinese speech recognition through models likeDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
SenseVoiceSmall and Paraformer-zh, but works for many other languages as well.
Install
FunASR is not bundled with any RealtimeSTT extra. Install it directly:Engine Name
Pass"funasr" as the transcription_engine parameter:
Basic Usage
- CUDA
- CPU
Model Selection
Known model names such asSenseVoiceSmall, Fun-ASR-Nano, and Paraformer-zh are downloaded automatically through ModelScope when first used. Pass the model name or a full ModelScope repository path:
Configuration Options
The following RealtimeSTT parameters map directly to FunASRAutoModel arguments:
| RealtimeSTT parameter | FunASR mapping |
|---|---|
model | model |
device | device |
beam_size | beam_size |
batch_size | batch_size |
transcription_engine_options: {"vad_filter": bool, "vad_model": str} | vad_model |
VAD Integration
To use FunASR’s built-in VAD model, passvad_filter and vad_model together via transcription_engine_options:
Notes and Limitations
The FunASR integration is still under active development. If you encounter an issue, please open a GitHub issue on the RealtimeSTT repository.
