Documentation Index
Fetch the complete documentation index at: https://mintlify.com/xdcobra/react-native-sherpa-onnx/llms.txt
Use this file to discover all available pages before exploring further.
This feature is coming in version 0.7.0 and is not yet available in the current release.
Overview
Voice Activity Detection (VAD) will enable real-time detection of speech vs. silence in audio streams. This is essential for:- Automatic silence removal in recordings
- Speech segmentation before transcription
- Reducing unnecessary processing during silent periods
- Triggering speech recognition only when needed
Planned Features
Real-time Detection
Detect voice activity as audio streams in
Low Latency
Minimal processing delay for responsive apps
Silence Removal
Automatically skip non-speech segments
Speech Segmentation
Split audio into speech and non-speech regions
Expected API (Preview)
While the API is not finalized, the expected interface will be:Use Cases
1. Efficient Recording
Only save or process audio segments containing speech:2. Pre-processing for STT
Segment continuous audio before transcription:3. Wake Word Detection
Trigger STT only when speech is detected:Planned Configuration
Expected Models
Likely model support:- Silero VAD - Lightweight, efficient, ONNX-based
- WebRTC VAD - Classic algorithm
- Custom models - Via sherpa-onnx framework
Timeline
VAD support is planned for:Stay Updated
To track progress or contribute:- Watch the GitHub repository
- Check the changelog
- Join discussions in issues or PRs
Current Workarounds
While VAD is not available, you can:- Use streaming STT with endpoint detection - The streaming STT API already includes basic endpoint detection
- External libraries - Use JavaScript audio analysis libraries
- Manual silence detection - Implement simple amplitude-based detection
Simple Amplitude Detection
Related Features
Streaming STT
Real-time transcription with endpoint detection
Speech Enhancement
Noise reduction (coming in v0.5.0)