Skip to main content

Overview

WhisperKit includes a local server that implements the OpenAI Audio API, allowing you to use existing OpenAI SDK clients or generate new ones. The server supports transcription and translation with output streaming capabilities.
For full-duplex real-time streaming, check out WhisperKit Pro Local Server which provides live audio streaming.

Building the Server

First, build the CLI with server support:
# Clone the repository if you haven't already
git clone https://github.com/argmaxinc/whisperkit.git
cd whisperkit

# Build with server support
make build-local-server

# Or manually with the build flag
BUILD_ALL=1 swift build --product whisperkit-cli

Starting the Server

Default Configuration

# Start server on default port (50060)
BUILD_ALL=1 swift run whisperkit-cli serve
The server will:
  • Listen on localhost:50060
  • Use the default tiny model
  • Download the model if not already available

Custom Configuration

# Custom host and port
BUILD_ALL=1 swift run whisperkit-cli serve \
    --host 0.0.0.0 \
    --port 8080

# With specific model
BUILD_ALL=1 swift run whisperkit-cli serve \
    --model base \
    --verbose

# See all options
BUILD_ALL=1 swift run whisperkit-cli serve --help

API Endpoints

The server exposes two main endpoints:
  • POST /v1/audio/transcriptions - Transcribe audio to text
  • POST /v1/audio/translations - Translate audio to English

Supported Parameters

ParameterDescriptionDefault
fileAudio file (wav, mp3, m4a, flac)Required
modelModel identifierServer default
languageSource language code (e.g., “en”, “es”)Auto-detect
promptText to guide transcriptionNone
response_formatOutput format: json, verbose_jsonverbose_json
temperatureSampling temperature (0.0-1.0)0.0
timestamp_granularities[]Timing detail: word, segmentsegment
streamEnable streaming outputfalse
include[]Include additional data: logprobsNone

Python Client

Use the OpenAI Python SDK to connect to the local server:

Installation

cd Examples/ServeCLIClient/Python
uv sync  # or: pip install openai

Quick Example

from openai import OpenAI

# Connect to local server
client = OpenAI(base_url="http://localhost:50060/v1")

# Transcribe audio file
with open("audio.wav", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        file=audio_file,
        model="tiny"  # Model parameter is required
    )

print(result.text)

Transcription with Options

from openai import OpenAI

client = OpenAI(base_url="http://localhost:50060/v1")

with open("audio.wav", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        file=audio_file,
        model="tiny",
        language="en",
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"]
    )

# Access detailed information
print(f"Language: {result.language}")
print(f"Duration: {result.duration}s")
print(f"Text: {result.text}")

# Word-level timestamps
for word in result.words:
    print(f"{word.start:.2f}s - {word.end:.2f}s: {word.word}")

# Segment-level timestamps
for segment in result.segments:
    print(f"[{segment.start:.2f}s]: {segment.text}")

Translation

# Translate audio to English
with open("spanish_audio.wav", "rb") as audio_file:
    result = client.audio.translations.create(
        file=audio_file,
        model="tiny"
    )

print(f"Translation: {result.text}")

Streaming Transcription

import requests
import json

# Use requests library for streaming
url = "http://localhost:50060/v1/audio/transcriptions"

with open("audio.wav", "rb") as audio_file:
    files = {"file": audio_file}
    data = {
        "model": "tiny",
        "stream": "true",
        "response_format": "verbose_json"
    }
    
    response = requests.post(
        url,
        files=files,
        data=data,
        headers={"Accept": "text/event-stream"},
        stream=True
    )
    
    # Process Server-Sent Events
    for line in response.iter_lines():
        if line:
            line_str = line.decode('utf-8')
            if line_str.startswith('data: '):
                data_str = line_str[6:]  # Remove 'data: ' prefix
                try:
                    event = json.loads(data_str)
                    if event.get('type') == 'transcript.text.delta':
                        print(event['delta'], end='', flush=True)
                    elif event.get('type') == 'transcript.text.done':
                        print(f"\nFinal: {event['text']}")
                except json.JSONDecodeError:
                    pass

Command Line Usage

# Using the provided Python client
cd Examples/ServeCLIClient/Python

# Transcribe
python whisperkit_client.py transcribe \
    --file audio.wav \
    --language en

# Translate
python whisperkit_client.py translate \
    --file audio.wav

# With streaming
python whisperkit_client.py transcribe \
    --file audio.wav \
    --stream

Swift Client

The Swift client is generated from the OpenAPI specification:

Installation

cd Examples/ServeCLIClient/Swift
swift build

Command Line Usage

# Transcribe
swift run whisperkit-client transcribe audio.wav --language en

# Translate
swift run whisperkit-client translate audio.wav

# With word-level timestamps
swift run whisperkit-client transcribe audio.wav \
    --timestamp-granularities word,segment \
    --response-format verbose_json

# Streaming
swift run whisperkit-client transcribe audio.wav --stream

Programmatic Usage

import Foundation
import WhisperKitSwiftClient

// Initialize client
let client = WhisperKitClient(
    serverURL: "http://localhost:50060/v1"
)

// Transcribe audio
try await client.transcribeAudio(
    filePath: "audio.wav",
    language: "en",
    model: "tiny",
    responseFormat: "verbose_json",
    timestampGranularities: "word,segment",
    stream: false
)

// Translate audio
try await client.translateAudio(
    filePath: "audio.wav",
    language: "es",
    model: "tiny",
    responseFormat: "verbose_json"
)

cURL Client

Use the provided shell scripts or raw cURL commands:

Using Shell Scripts

cd Examples/ServeCLIClient/Curl
chmod +x *.sh

# Transcribe
./transcribe.sh audio.wav --language en

# Translate
./translate.sh audio.wav --language es

# With all options
./transcribe.sh audio.wav \
    --model base \
    --language en \
    --timestamp-granularities word,segment \
    --stream true

# Run comprehensive test suite
./test.sh

Raw cURL Commands

Basic Transcription

curl -X POST http://localhost:50060/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model="tiny" \
  -F response_format="verbose_json"

With Word Timestamps

curl -X POST http://localhost:50060/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model="tiny" \
  -F response_format="verbose_json" \
  -F timestamp_granularities[]="word,segment"

Streaming Output

curl -N -X POST http://localhost:50060/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model="tiny" \
  -F stream="true" \
  -H "Accept: text/event-stream"

Translation

curl -X POST http://localhost:50060/v1/audio/translations \
  -F file=@audio.wav \
  -F model="tiny" \
  -F response_format="verbose_json"

With Log Probabilities

curl -X POST http://localhost:50060/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model="tiny" \
  -F response_format="json" \
  -F "include[]=logprobs"

JavaScript/TypeScript Client

Installation

npm install openai
# or
yarn add openai

Usage

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  baseURL: 'http://localhost:50060/v1',
  apiKey: 'dummy-key'  // Not used by local server
});

// Transcribe
const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.wav'),
  model: 'tiny',
  language: 'en',
  response_format: 'verbose_json',
  timestamp_granularities: ['word', 'segment']
});

console.log(transcription.text);

// Access word timestamps
transcription.words?.forEach(word => {
  console.log(`${word.start}s: ${word.word}`);
});

// Translate
const translation = await client.audio.translations.create({
  file: fs.createReadStream('audio.wav'),
  model: 'tiny'
});

console.log(translation.text);

Generating Custom Clients

You can generate clients for any language using the OpenAPI specification:

Get the OpenAPI Spec

# Generate the latest spec
make generate-server

# The spec is located at:
# scripts/specs/localserver_openapi.yaml

Generate Clients

Python Client

swift run swift-openapi-generator generate \
  scripts/specs/localserver_openapi.yaml \
  --output-directory python-client \
  --mode client \
  --mode types

TypeScript Client

npx @openapitools/openapi-generator-cli generate \
  -i scripts/specs/localserver_openapi.yaml \
  -g typescript-fetch \
  -o typescript-client

Go Client

openapi-generator-cli generate \
  -i scripts/specs/localserver_openapi.yaml \
  -g go \
  -o go-client

API Limitations

Compared to the official OpenAI API:
  • Response formats: Only json and verbose_json supported (no plain text, SRT, VTT)
  • Model selection: Server must be launched with desired model via --model flag

Fully Supported Features

The local server fully supports:
  • Log probabilities: include[]=logprobs parameter for token-level confidence
  • Streaming responses: Server-Sent Events (SSE) for real-time transcription
  • Timestamp granularities: Both word and segment level timing
  • Language detection: Automatic language detection or manual specification
  • Temperature control: Sampling temperature for transcription randomness
  • Prompt text: Text guidance for transcription style and context

Server Configuration

Environment Variables

# Set custom model cache directory
export WHISPERKIT_CACHE_DIR="/path/to/models"

# Enable debug logging
BUILD_ALL=1 swift run whisperkit-cli serve --verbose

Model Management

# Download a model before starting server
make download-model MODEL=base

# Start server with downloaded model
BUILD_ALL=1 swift run whisperkit-cli serve \
    --model-path "Models/whisperkit-coreml/openai_whisper-base"

Docker Deployment

Create a Dockerfile:
FROM swift:5.9

# Install dependencies
RUN apt-get update && apt-get install -y \
    git \
    git-lfs

# Clone and build WhisperKit
WORKDIR /app
RUN git clone https://github.com/argmaxinc/whisperkit.git
WORKDIR /app/whisperkit

RUN make setup
RUN make download-model MODEL=tiny
RUN BUILD_ALL=1 swift build --product whisperkit-cli -c release

EXPOSE 50060

CMD ["BUILD_ALL=1", "swift", "run", "-c", "release", "whisperkit-cli", "serve", "--host", "0.0.0.0"]
Build and run:
# Build image
docker build -t whisperkit-server .

# Run container
docker run -p 50060:50060 whisperkit-server

Next Steps

Basic Transcription

Learn the basics of file transcription

Real-Time Streaming

Transcribe audio in real-time

Build docs developers (and LLMs) love