Quick Start Guide

Get started with offline speech-to-text and text-to-speech in under 5 minutes.

Prerequisites

Before you begin, make sure you have:

Completed the Installation steps
A model downloaded (see Model Setup or use the quick download below)
An audio file to test (or use the examples below)

Download a Model

For this guide, we’ll use a small Whisper model for English transcription:

Choose a model

Download the Whisper Tiny English model (~40MB, fast, good accuracy):

# Using wget or curl
curl -LO https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
tar -xvf sherpa-onnx-whisper-tiny.en.tar.bz2

Or use the Model Download Manager in your app:

import { downloadModel } from 'react-native-sherpa-onnx/download';

await downloadModel({
  url: 'https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2',
  destinationPath: '/path/to/models',
  onProgress: (progress) => console.log(`${progress}%`),
});

Place the model in your app

For Android, place the model folder in android/app/src/main/assets/models/:

android/app/src/main/assets/models/
  └── sherpa-onnx-whisper-tiny.en/
      ├── tiny.en-encoder.onnx
      ├── tiny.en-decoder.onnx
      └── tiny.en-tokens.txt

For iOS, add the model folder to your Xcode project as a resource.

See Model Setup for detailed instructions on bundling models, using Play Asset Delivery, or loading from the filesystem.

Speech-to-Text (STT)

Transcribe audio files with offline speech recognition.

Import the STT module

import { createSTT } from 'react-native-sherpa-onnx/stt';
import type { SttEngine } from 'react-native-sherpa-onnx/stt';

Initialize the STT engine

Create an STT instance with your model:

const stt: SttEngine = await createSTT({
  modelPath: {
    type: 'asset',
    path: 'models/sherpa-onnx-whisper-tiny.en',
  },
  modelType: 'whisper', // Optional: auto-detect if omitted
  numThreads: 2, // Adjust based on device
});

Model path options

You can load models from different locations:

// From app assets (bundled with app)
modelPath: { type: 'asset', path: 'models/whisper-tiny' }

// From filesystem
modelPath: { type: 'file', path: '/absolute/path/to/model' }

// Auto-detect (searches assets, then filesystem)
modelPath: { type: 'auto', path: 'models/whisper-tiny' }

Transcribe an audio file

const result = await stt.transcribeFile('/path/to/audio.wav');

console.log('Transcription:', result.text);
// Output: "Hello, how are you today?"

console.log('Tokens:', result.tokens);
// Output: ["Hello", ",", "how", "are", "you", "today", "?"]

console.log('Timestamps:', result.timestamps);
// Output: [0.0, 0.5, 0.6, 1.0, 1.2, 1.5, 2.0]

Clean up

Always destroy the engine when done to free native resources:

await stt.destroy();

Transcribe Audio Samples

You can also transcribe raw PCM audio samples:

const samples: number[] = [...]; // Float32 PCM samples, range [-1, 1]
const sampleRate = 16000; // Hz

const result = await stt.transcribeSamples(samples, sampleRate);
console.log(result.text);

Complete STT Example

STTExample.tsx

import { useState } from 'react';
import { View, Button, Text } from 'react-native';
import { createSTT } from 'react-native-sherpa-onnx/stt';
import type { SttEngine } from 'react-native-sherpa-onnx/stt';

export default function STTExample() {
  const [transcription, setTranscription] = useState('');
  const [loading, setLoading] = useState(false);

  const transcribeAudio = async () => {
    setLoading(true);
    let stt: SttEngine | null = null;

    try {
      // Initialize STT
      stt = await createSTT({
        modelPath: {
          type: 'asset',
          path: 'models/sherpa-onnx-whisper-tiny.en',
        },
        modelType: 'whisper',
        numThreads: 2,
      });

      // Transcribe audio file
      const result = await stt.transcribeFile('/path/to/audio.wav');
      setTranscription(result.text);
    } catch (error) {
      console.error('Transcription failed:', error);
    } finally {
      // Clean up
      if (stt) await stt.destroy();
      setLoading(false);
    }
  };

  return (
    <View>
      <Button
        title={loading ? 'Transcribing...' : 'Transcribe Audio'}
        onPress={transcribeAudio}
        disabled={loading}
      />
      {transcription && <Text>Result: {transcription}</Text>}
    </View>
  );
}

Text-to-Speech (TTS)

Generate natural speech from text offline.

Download a TTS model

Download a VITS Piper model (~10-50MB depending on voice):

# English (US) female voice
curl -LO https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-lessac-medium.tar.bz2
tar -xvf vits-piper-en_US-lessac-medium.tar.bz2

Place in android/app/src/main/assets/models/ or add to Xcode resources.

Import and initialize TTS

import { createTTS } from 'react-native-sherpa-onnx/tts';
import type { TtsEngine } from 'react-native-sherpa-onnx/tts';

const tts: TtsEngine = await createTTS({
  modelPath: {
    type: 'asset',
    path: 'models/vits-piper-en_US-lessac-medium',
  },
  modelType: 'vits',
  numThreads: 2,
});

Generate speech

const audio = await tts.generateSpeech('Hello, world!');

console.log('Sample rate:', audio.sampleRate);
// Output: 22050

console.log('Audio samples:', audio.samples.length);
// Output: 44100 (2 seconds of audio)

Save or play the audio

import { saveAudioToFile } from 'react-native-sherpa-onnx/tts';
import Sound from 'react-native-sound';

// Save to file
const filePath = await saveAudioToFile(
  audio,
  '/path/to/output.wav'
);
console.log('Saved to:', filePath);

// Play the audio
const sound = new Sound(filePath, '', (error) => {
  if (error) {
    console.error('Failed to load sound', error);
    return;
  }
  sound.play();
});

Clean up

await tts.destroy();

TTS with Options

Customize speech generation with options:

const audio = await tts.generateSpeech('Hello, world!', {
  speed: 1.2,           // Speak 20% faster
  sid: 0,               // Speaker ID (for multi-speaker models)
  silenceScale: 0.5,    // Reduce silence duration
});

Complete TTS Example

TTSExample.tsx

import { useState } from 'react';
import { View, TextInput, Button, Text } from 'react-native';
import { createTTS, saveAudioToFile } from 'react-native-sherpa-onnx/tts';
import type { TtsEngine } from 'react-native-sherpa-onnx/tts';
import Sound from 'react-native-sound';

export default function TTSExample() {
  const [text, setText] = useState('Hello, world!');
  const [generating, setGenerating] = useState(false);
  const [audioPath, setAudioPath] = useState<string | null>(null);

  const generateSpeech = async () => {
    setGenerating(true);
    let tts: TtsEngine | null = null;

    try {
      // Initialize TTS
      tts = await createTTS({
        modelPath: {
          type: 'asset',
          path: 'models/vits-piper-en_US-lessac-medium',
        },
        modelType: 'vits',
      });

      // Generate speech
      const audio = await tts.generateSpeech(text, { speed: 1.0 });

      // Save to file
      const outputPath = `/tmp/speech_${Date.now()}.wav`;
      await saveAudioToFile(audio, outputPath);
      setAudioPath(outputPath);

      // Play
      const sound = new Sound(outputPath, '', (error) => {
        if (!error) sound.play();
      });
    } catch (error) {
      console.error('TTS failed:', error);
    } finally {
      if (tts) await tts.destroy();
      setGenerating(false);
    }
  };

  return (
    <View>
      <TextInput
        value={text}
        onChangeText={setText}
        placeholder="Enter text to speak"
      />
      <Button
        title={generating ? 'Generating...' : 'Generate Speech'}
        onPress={generateSpeech}
        disabled={generating}
      />
      {audioPath && <Text>Audio saved to: {audioPath}</Text>}
    </View>
  );
}

Real-Time Streaming Recognition

Transcribe live microphone input with partial results.

Import streaming STT

import { createStreamingSTT } from 'react-native-sherpa-onnx/stt';
import type { StreamingSttEngine, SttStream } from 'react-native-sherpa-onnx/stt';

Initialize streaming engine

const streamingStt: StreamingSttEngine = await createStreamingSTT({
  modelPath: {
    type: 'asset',
    path: 'models/sherpa-onnx-streaming-zipformer-en',
  },
  modelType: 'transducer', // Streaming-capable model
  numThreads: 2,
});

Only certain model types support streaming: transducer, paraformer, zipformer2_ctc, nemo_ctc, tone_ctc.

Create a stream and feed audio

const stream: SttStream = await streamingStt.createStream();

// Feed audio samples (Float32, 16kHz recommended)
const samples: number[] = [...];
await stream.acceptWaveform(samples, 16000);

// Get partial result
const partial = await stream.getResult();
console.log('Partial:', partial.text);

// Check if speech endpoint detected
const isEndpoint = await stream.isEndpoint();
if (isEndpoint) {
  // Finalize the segment
  const final = await stream.getResult();
  console.log('Final:', final.text);
  await stream.reset();
}

Clean up

await stream.destroy();
await streamingStt.destroy();

Real-Time Microphone Transcription

MicrophoneSTT.tsx

import { useState, useRef } from 'react';
import { Button, Text, View } from 'react-native';
import { createStreamingSTT } from 'react-native-sherpa-onnx/stt';
import type { StreamingSttEngine, SttStream } from 'react-native-sherpa-onnx/stt';
import { AudioRecorder, useAudioInput } from 'react-native-audio-api';

export default function MicrophoneSTT() {
  const [isRecording, setIsRecording] = useState(false);
  const [partialText, setPartialText] = useState('');
  const [finalText, setFinalText] = useState('');

  const engineRef = useRef<StreamingSttEngine | null>(null);
  const streamRef = useRef<SttStream | null>(null);
  const recorderRef = useRef<AudioRecorder | null>(null);

  const startRecording = async () => {
    try {
      // Initialize engine
      engineRef.current = await createStreamingSTT({
        modelPath: { type: 'asset', path: 'models/zipformer-en' },
        modelType: 'transducer',
      });

      // Create stream
      streamRef.current = await engineRef.current.createStream();

      // Start microphone recording
      recorderRef.current = new AudioRecorder({
        sampleRate: 16000,
        channelCount: 1,
      });

      recorderRef.current.onDataAvailable((samples) => {
        if (streamRef.current) {
          streamRef.current.acceptWaveform(samples, 16000);
          
          // Get partial result
          streamRef.current.getResult().then((result) => {
            setPartialText(result.text);

            // Check for endpoint
            streamRef.current?.isEndpoint().then((isEnd) => {
              if (isEnd) {
                setFinalText((prev) => prev + ' ' + result.text);
                setPartialText('');
                streamRef.current?.reset();
              }
            });
          });
        }
      });

      recorderRef.current.start();
      setIsRecording(true);
    } catch (error) {
      console.error('Failed to start recording:', error);
    }
  };

  const stopRecording = async () => {
    if (recorderRef.current) {
      recorderRef.current.stop();
    }

    if (streamRef.current) {
      await streamRef.current.destroy();
    }

    if (engineRef.current) {
      await engineRef.current.destroy();
    }

    setIsRecording(false);
  };

  return (
    <View>
      <Button
        title={isRecording ? 'Stop Recording' : 'Start Recording'}
        onPress={isRecording ? stopRecording : startRecording}
      />
      <Text>Partial: {partialText}</Text>
      <Text>Final: {finalText}</Text>
    </View>
  );
}

Next Steps

Now that you’ve built your first speech app, explore more features:

Model Setup

Learn about model types, quantization, and Play Asset Delivery

STT API Reference

Complete STT API documentation

TTS API Reference

Complete TTS API documentation

Streaming TTS

Low-latency incremental speech generation

Execution Providers

Hardware acceleration with NNAPI, Core ML, QNN

Example App

Browse the full-featured example application

Need help? Check out the example app source code for complete working examples of STT, TTS, and streaming.

Get Started

Core Features

Guides

Platform Specific

Advanced

Quick Start

Quick Start Guide

Prerequisites

Download a Model

Speech-to-Text (STT)

Transcribe Audio Samples

Complete STT Example

Text-to-Speech (TTS)

TTS with Options

Complete TTS Example

Real-Time Streaming Recognition

Real-Time Microphone Transcription

Next Steps

Model Setup

STT API Reference

TTS API Reference

Streaming TTS

Execution Providers

Example App

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

Platform Specific

Advanced

Documentation Index

​Quick Start Guide

​Prerequisites

​Download a Model

​Speech-to-Text (STT)

​Transcribe Audio Samples

​Complete STT Example

​Text-to-Speech (TTS)

​TTS with Options

​Complete TTS Example

​Real-Time Streaming Recognition

​Real-Time Microphone Transcription

​Next Steps

Model Setup

STT API Reference

TTS API Reference

Streaming TTS

Execution Providers

Example App

Build docs developers (and LLMs) love

Quick Start Guide

Prerequisites

Download a Model

Speech-to-Text (STT)

Transcribe Audio Samples

Complete STT Example

Text-to-Speech (TTS)

TTS with Options

Complete TTS Example

Real-Time Streaming Recognition

Real-Time Microphone Transcription

Next Steps