Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/konhi/elevenlabs-speech-to-text-api-ui/llms.txt

Use this file to discover all available pages before exploring further.

This guide will walk you through setting up the project and transcribing your first audio file using the ElevenLabs Scribe API.

Prerequisites

Before you begin, make sure you have:
  • Bun installed on your system
  • An ElevenLabs API key (sign up for free at ElevenLabs)
  • An audio or video file to transcribe
1

Install dependencies

Clone the repository and install the required packages using Bun:
git clone <repository-url>
cd <project-directory>
bun install
This will install all dependencies from package.json, including:
  • @elevenlabs/elevenlabs-js - Official ElevenLabs SDK
  • react and react-dom - React framework
  • @radix-ui/* - Accessible UI components
  • tailwindcss - Styling framework
2

Start the development server

Launch the development server with hot module reloading:
bun dev
You should see output similar to:
🚀 Server running at http://localhost:3000
Open your browser and navigate to http://localhost:3000 to see the application.
The dev server includes automatic browser hot reloading and console log echoing, so you can see changes instantly as you modify the code.
3

Get your ElevenLabs API key

To use the Speech-to-Text API, you’ll need an ElevenLabs API key:
  1. Go to ElevenLabs
  2. Sign up or log in to your account
  3. Navigate to your profile settings
  4. Copy your API key
Keep your API key secure and never commit it to version control. The application accepts the API key through the UI form for security.
4

Transcribe your first audio file

Now you’re ready to transcribe audio:
  1. Enter your API key - Paste your ElevenLabs API key into the “ElevenLabs API Key” field
  2. Upload an audio file - Click “Choose File” and select an audio or video file (supports MP3, WAV, M4A, AAC, OGG, WebM, and more)
  3. Configure options (optional) - Customize transcription settings:
    • Model: Choose between Scribe V1 or Scribe V2 (V2 recommended)
    • Diarize: Enable speaker detection for multi-speaker audio
    • Timestamps Granularity: Choose word or character-level timestamps
    • Language Code: Specify a language (e.g., “en”, “es”, “fr”) for better accuracy
  4. Click “Transcribe Audio” - The application will send your file to the ElevenLabs API
// The transcription is handled by the ElevenLabsClient from the SDK
const browserClient = new ElevenLabsClient({ apiKey });
const transcriptResponse = await browserClient.speechToText.convert({
  file,
  modelId: "scribe_v2",
  timestampsGranularity: "character",
  diarize: false,
});
Transcription time depends on your audio file length and the selected options. Most files process in seconds.
5

View and interact with results

Once transcription completes, you’ll see:
  • Full transcript text - The complete transcription of your audio
  • Audio player - Play back the original audio file
  • Interactive transcript - Click any word to jump to that timestamp in the audio
  • Speaker labels - If you enabled diarization, you can rename speakers for clarity
The transcript viewer provides character-level alignment, so you can see exactly which words were spoken at each moment:
// Each word in the transcript includes timing information
{
  text: "Hello",
  start: 0.5,
  end: 0.9,
  speaker: "speaker_01"
}

Next steps

Now that you’ve transcribed your first audio file, you can:

Explore advanced options

Try different models, enable entity detection, add custom keyterms, or adjust the temperature parameter

Read the installation guide

Learn more about setting up the development environment and project structure

Common configuration options

Here are some popular transcription configurations:

Interview or podcast transcription

{
  modelId: "scribe_v2",
  diarize: true,              // Detect different speakers
  numSpeakers: 2,             // Specify number of speakers if known
  timestampsGranularity: "word"
}

Subtitle generation

{
  modelId: "scribe_v2",
  timestampsGranularity: "word",
  languageCode: "en"          // Specify language for accuracy
}
{
  modelId: "scribe_v2",
  entityDetection: "pii",     // Detect personally identifiable information
  keyterms: ["medical term", "legal term"], // Domain-specific vocabulary
  temperature: 0.0            // Most deterministic output
}

Build docs developers (and LLMs) love