Quickstart

This guide will walk you through setting up the project and transcribing your first audio file using the ElevenLabs Scribe API.

Prerequisites

Before you begin, make sure you have:

Bun installed on your system
An ElevenLabs API key (sign up for free at ElevenLabs)
An audio or video file to transcribe

Install dependencies

Clone the repository and install the required packages using Bun:

git clone <repository-url>
cd <project-directory>
bun install

This will install all dependencies from package.json, including:

@elevenlabs/elevenlabs-js - Official ElevenLabs SDK
react and react-dom - React framework
@radix-ui/* - Accessible UI components
tailwindcss - Styling framework

Start the development server

Launch the development server with hot module reloading:

bun dev

You should see output similar to:

🚀 Server running at http://localhost:3000

Open your browser and navigate to http://localhost:3000 to see the application.

The dev server includes automatic browser hot reloading and console log echoing, so you can see changes instantly as you modify the code.

Get your ElevenLabs API key

To use the Speech-to-Text API, you’ll need an ElevenLabs API key:

Go to ElevenLabs
Sign up or log in to your account
Navigate to your profile settings
Copy your API key

Keep your API key secure and never commit it to version control. The application accepts the API key through the UI form for security.

Transcribe your first audio file

Now you’re ready to transcribe audio:

Enter your API key - Paste your ElevenLabs API key into the “ElevenLabs API Key” field
Upload an audio file - Click “Choose File” and select an audio or video file (supports MP3, WAV, M4A, AAC, OGG, WebM, and more)
Configure options (optional) - Customize transcription settings:
- Model: Choose between Scribe V1 or Scribe V2 (V2 recommended)
- Diarize: Enable speaker detection for multi-speaker audio
- Timestamps Granularity: Choose word or character-level timestamps
- Language Code: Specify a language (e.g., “en”, “es”, “fr”) for better accuracy
Click “Transcribe Audio” - The application will send your file to the ElevenLabs API

// The transcription is handled by the ElevenLabsClient from the SDK
const browserClient = new ElevenLabsClient({ apiKey });
const transcriptResponse = await browserClient.speechToText.convert({
  file,
  modelId: "scribe_v2",
  timestampsGranularity: "character",
  diarize: false,
});

Transcription time depends on your audio file length and the selected options. Most files process in seconds.

View and interact with results

Once transcription completes, you’ll see:

Full transcript text - The complete transcription of your audio
Audio player - Play back the original audio file
Interactive transcript - Click any word to jump to that timestamp in the audio
Speaker labels - If you enabled diarization, you can rename speakers for clarity

The transcript viewer provides character-level alignment, so you can see exactly which words were spoken at each moment:

// Each word in the transcript includes timing information
{
  text: "Hello",
  start: 0.5,
  end: 0.9,
  speaker: "speaker_01"
}

Next steps

Now that you’ve transcribed your first audio file, you can:

Explore advanced options

Try different models, enable entity detection, add custom keyterms, or adjust the temperature parameter

Read the installation guide

Learn more about setting up the development environment and project structure

Common configuration options

Here are some popular transcription configurations:

Interview or podcast transcription

{
  modelId: "scribe_v2",
  diarize: true,              // Detect different speakers
  numSpeakers: 2,             // Specify number of speakers if known
  timestampsGranularity: "word"
}

Subtitle generation

{
  modelId: "scribe_v2",
  timestampsGranularity: "word",
  languageCode: "en"          // Specify language for accuracy
}

Medical or legal transcription

{
  modelId: "scribe_v2",
  entityDetection: "pii",     // Detect personally identifiable information
  keyterms: ["medical term", "legal term"], // Domain-specific vocabulary
  temperature: 0.0            // Most deterministic output
}

Get Started

Core Features

Configuration

Deployment

Prerequisites

Next steps

Explore advanced options

Read the installation guide

Common configuration options

Interview or podcast transcription

Subtitle generation

Medical or legal transcription

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

Documentation Index

​Prerequisites

​Next steps

Explore advanced options

Read the installation guide

​Common configuration options

​Interview or podcast transcription

​Subtitle generation

​Medical or legal transcription

Build docs developers (and LLMs) love

Prerequisites

Next steps

Common configuration options

Interview or podcast transcription

Subtitle generation

Medical or legal transcription