Documentation Index Fetch the complete documentation index at: https://mintlify.com/xdcobra/react-native-sherpa-onnx/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start Guide
Get started with offline speech-to-text and text-to-speech in under 5 minutes.
Prerequisites
Before you begin, make sure you have:
Completed the Installation steps
A model downloaded (see Model Setup or use the quick download below)
An audio file to test (or use the examples below)
Download a Model
For this guide, we’ll use a small Whisper model for English transcription:
Choose a model
Download the Whisper Tiny English model (~40MB, fast, good accuracy): # Using wget or curl
curl -LO https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
tar -xvf sherpa-onnx-whisper-tiny.en.tar.bz2
Or use the Model Download Manager in your app: import { downloadModel } from 'react-native-sherpa-onnx/download' ;
await downloadModel ({
url: 'https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2' ,
destinationPath: '/path/to/models' ,
onProgress : ( progress ) => console . log ( ` ${ progress } %` ),
});
Place the model in your app
For Android, place the model folder in android/app/src/main/assets/models/: android/app/src/main/assets/models/
└── sherpa-onnx-whisper-tiny.en/
├── tiny.en-encoder.onnx
├── tiny.en-decoder.onnx
└── tiny.en-tokens.txt
For iOS, add the model folder to your Xcode project as a resource.
See Model Setup for detailed instructions on bundling models, using Play Asset Delivery, or loading from the filesystem.
Speech-to-Text (STT)
Transcribe audio files with offline speech recognition.
Import the STT module
import { createSTT } from 'react-native-sherpa-onnx/stt' ;
import type { SttEngine } from 'react-native-sherpa-onnx/stt' ;
Initialize the STT engine
Create an STT instance with your model: const stt : SttEngine = await createSTT ({
modelPath: {
type: 'asset' ,
path: 'models/sherpa-onnx-whisper-tiny.en' ,
},
modelType: 'whisper' , // Optional: auto-detect if omitted
numThreads: 2 , // Adjust based on device
});
You can load models from different locations: // From app assets (bundled with app)
modelPath : { type : 'asset' , path : 'models/whisper-tiny' }
// From filesystem
modelPath : { type : 'file' , path : '/absolute/path/to/model' }
// Auto-detect (searches assets, then filesystem)
modelPath : { type : 'auto' , path : 'models/whisper-tiny' }
Transcribe an audio file
const result = await stt . transcribeFile ( '/path/to/audio.wav' );
console . log ( 'Transcription:' , result . text );
// Output: "Hello, how are you today?"
console . log ( 'Tokens:' , result . tokens );
// Output: ["Hello", ",", "how", "are", "you", "today", "?"]
console . log ( 'Timestamps:' , result . timestamps );
// Output: [0.0, 0.5, 0.6, 1.0, 1.2, 1.5, 2.0]
Clean up
Always destroy the engine when done to free native resources:
Transcribe Audio Samples
You can also transcribe raw PCM audio samples:
const samples : number [] = [ ... ]; // Float32 PCM samples, range [-1, 1]
const sampleRate = 16000 ; // Hz
const result = await stt . transcribeSamples ( samples , sampleRate );
console . log ( result . text );
Complete STT Example
import { useState } from 'react' ;
import { View , Button , Text } from 'react-native' ;
import { createSTT } from 'react-native-sherpa-onnx/stt' ;
import type { SttEngine } from 'react-native-sherpa-onnx/stt' ;
export default function STTExample () {
const [ transcription , setTranscription ] = useState ( '' );
const [ loading , setLoading ] = useState ( false );
const transcribeAudio = async () => {
setLoading ( true );
let stt : SttEngine | null = null ;
try {
// Initialize STT
stt = await createSTT ({
modelPath: {
type: 'asset' ,
path: 'models/sherpa-onnx-whisper-tiny.en' ,
},
modelType: 'whisper' ,
numThreads: 2 ,
});
// Transcribe audio file
const result = await stt . transcribeFile ( '/path/to/audio.wav' );
setTranscription ( result . text );
} catch ( error ) {
console . error ( 'Transcription failed:' , error );
} finally {
// Clean up
if ( stt ) await stt . destroy ();
setLoading ( false );
}
};
return (
< View >
< Button
title = {loading ? 'Transcribing...' : 'Transcribe Audio' }
onPress = { transcribeAudio }
disabled = { loading }
/>
{ transcription && < Text > Result : { transcription }</ Text >}
</ View >
);
}
Text-to-Speech (TTS)
Generate natural speech from text offline.
Download a TTS model
Download a VITS Piper model (~10-50MB depending on voice): # English (US) female voice
curl -LO https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-lessac-medium.tar.bz2
tar -xvf vits-piper-en_US-lessac-medium.tar.bz2
Place in android/app/src/main/assets/models/ or add to Xcode resources.
Import and initialize TTS
import { createTTS } from 'react-native-sherpa-onnx/tts' ;
import type { TtsEngine } from 'react-native-sherpa-onnx/tts' ;
const tts : TtsEngine = await createTTS ({
modelPath: {
type: 'asset' ,
path: 'models/vits-piper-en_US-lessac-medium' ,
},
modelType: 'vits' ,
numThreads: 2 ,
});
Generate speech
const audio = await tts . generateSpeech ( 'Hello, world!' );
console . log ( 'Sample rate:' , audio . sampleRate );
// Output: 22050
console . log ( 'Audio samples:' , audio . samples . length );
// Output: 44100 (2 seconds of audio)
Save or play the audio
import { saveAudioToFile } from 'react-native-sherpa-onnx/tts' ;
import Sound from 'react-native-sound' ;
// Save to file
const filePath = await saveAudioToFile (
audio ,
'/path/to/output.wav'
);
console . log ( 'Saved to:' , filePath );
// Play the audio
const sound = new Sound ( filePath , '' , ( error ) => {
if ( error ) {
console . error ( 'Failed to load sound' , error );
return ;
}
sound . play ();
});
TTS with Options
Customize speech generation with options:
const audio = await tts . generateSpeech ( 'Hello, world!' , {
speed: 1.2 , // Speak 20% faster
sid: 0 , // Speaker ID (for multi-speaker models)
silenceScale: 0.5 , // Reduce silence duration
});
Complete TTS Example
import { useState } from 'react' ;
import { View , TextInput , Button , Text } from 'react-native' ;
import { createTTS , saveAudioToFile } from 'react-native-sherpa-onnx/tts' ;
import type { TtsEngine } from 'react-native-sherpa-onnx/tts' ;
import Sound from 'react-native-sound' ;
export default function TTSExample () {
const [ text , setText ] = useState ( 'Hello, world!' );
const [ generating , setGenerating ] = useState ( false );
const [ audioPath , setAudioPath ] = useState < string | null >( null );
const generateSpeech = async () => {
setGenerating ( true );
let tts : TtsEngine | null = null ;
try {
// Initialize TTS
tts = await createTTS ({
modelPath: {
type: 'asset' ,
path: 'models/vits-piper-en_US-lessac-medium' ,
},
modelType: 'vits' ,
});
// Generate speech
const audio = await tts . generateSpeech ( text , { speed: 1.0 });
// Save to file
const outputPath = `/tmp/speech_ ${ Date . now () } .wav` ;
await saveAudioToFile ( audio , outputPath );
setAudioPath ( outputPath );
// Play
const sound = new Sound ( outputPath , '' , ( error ) => {
if ( ! error ) sound . play ();
});
} catch ( error ) {
console . error ( 'TTS failed:' , error );
} finally {
if ( tts ) await tts . destroy ();
setGenerating ( false );
}
};
return (
< View >
< TextInput
value = { text }
onChangeText = { setText }
placeholder = "Enter text to speak"
/>
< Button
title = {generating ? 'Generating...' : 'Generate Speech' }
onPress = { generateSpeech }
disabled = { generating }
/>
{ audioPath && < Text > Audio saved to : { audioPath }</ Text >}
</ View >
);
}
Real-Time Streaming Recognition
Transcribe live microphone input with partial results.
Import streaming STT
import { createStreamingSTT } from 'react-native-sherpa-onnx/stt' ;
import type { StreamingSttEngine , SttStream } from 'react-native-sherpa-onnx/stt' ;
Initialize streaming engine
const streamingStt : StreamingSttEngine = await createStreamingSTT ({
modelPath: {
type: 'asset' ,
path: 'models/sherpa-onnx-streaming-zipformer-en' ,
},
modelType: 'transducer' , // Streaming-capable model
numThreads: 2 ,
});
Only certain model types support streaming: transducer, paraformer, zipformer2_ctc, nemo_ctc, tone_ctc.
Create a stream and feed audio
const stream : SttStream = await streamingStt . createStream ();
// Feed audio samples (Float32, 16kHz recommended)
const samples : number [] = [ ... ];
await stream . acceptWaveform ( samples , 16000 );
// Get partial result
const partial = await stream . getResult ();
console . log ( 'Partial:' , partial . text );
// Check if speech endpoint detected
const isEndpoint = await stream . isEndpoint ();
if ( isEndpoint ) {
// Finalize the segment
const final = await stream . getResult ();
console . log ( 'Final:' , final . text );
await stream . reset ();
}
Clean up
await stream . destroy ();
await streamingStt . destroy ();
Real-Time Microphone Transcription
import { useState , useRef } from 'react' ;
import { Button , Text , View } from 'react-native' ;
import { createStreamingSTT } from 'react-native-sherpa-onnx/stt' ;
import type { StreamingSttEngine , SttStream } from 'react-native-sherpa-onnx/stt' ;
import { AudioRecorder , useAudioInput } from 'react-native-audio-api' ;
export default function MicrophoneSTT () {
const [ isRecording , setIsRecording ] = useState ( false );
const [ partialText , setPartialText ] = useState ( '' );
const [ finalText , setFinalText ] = useState ( '' );
const engineRef = useRef < StreamingSttEngine | null >( null );
const streamRef = useRef < SttStream | null >( null );
const recorderRef = useRef < AudioRecorder | null >( null );
const startRecording = async () => {
try {
// Initialize engine
engineRef . current = await createStreamingSTT ({
modelPath: { type: 'asset' , path: 'models/zipformer-en' },
modelType: 'transducer' ,
});
// Create stream
streamRef . current = await engineRef . current . createStream ();
// Start microphone recording
recorderRef . current = new AudioRecorder ({
sampleRate: 16000 ,
channelCount: 1 ,
});
recorderRef . current . onDataAvailable (( samples ) => {
if ( streamRef . current ) {
streamRef . current . acceptWaveform ( samples , 16000 );
// Get partial result
streamRef . current . getResult (). then (( result ) => {
setPartialText ( result . text );
// Check for endpoint
streamRef . current ?. isEndpoint (). then (( isEnd ) => {
if ( isEnd ) {
setFinalText (( prev ) => prev + ' ' + result . text );
setPartialText ( '' );
streamRef . current ?. reset ();
}
});
});
}
});
recorderRef . current . start ();
setIsRecording ( true );
} catch ( error ) {
console . error ( 'Failed to start recording:' , error );
}
};
const stopRecording = async () => {
if ( recorderRef . current ) {
recorderRef . current . stop ();
}
if ( streamRef . current ) {
await streamRef . current . destroy ();
}
if ( engineRef . current ) {
await engineRef . current . destroy ();
}
setIsRecording ( false );
};
return (
< View >
< Button
title = {isRecording ? 'Stop Recording' : 'Start Recording' }
onPress = {isRecording ? stopRecording : startRecording }
/>
< Text > Partial : { partialText }</ Text >
< Text > Final : { finalText }</ Text >
</ View >
);
}
Next Steps
Now that you’ve built your first speech app, explore more features:
Model Setup Learn about model types, quantization, and Play Asset Delivery
STT API Reference Complete STT API documentation
TTS API Reference Complete TTS API documentation
Streaming TTS Low-latency incremental speech generation
Execution Providers Hardware acceleration with NNAPI, Core ML, QNN
Example App Browse the full-featured example application