Local Server & Clients

Overview

WhisperKit includes a local server that implements the OpenAI Audio API, allowing you to use existing OpenAI SDK clients or generate new ones. The server supports transcription and translation with output streaming capabilities.

For full-duplex real-time streaming, check out WhisperKit Pro Local Server which provides live audio streaming.

Building the Server

First, build the CLI with server support:

# Clone the repository if you haven't already
git clone https://github.com/argmaxinc/whisperkit.git
cd whisperkit

# Build with server support
make build-local-server

# Or manually with the build flag
BUILD_ALL=1 swift build --product whisperkit-cli

Starting the Server

Default Configuration

# Start server on default port (50060)
BUILD_ALL=1 swift run whisperkit-cli serve

The server will:

Listen on localhost:50060
Use the default tiny model
Download the model if not already available

Custom Configuration

# Custom host and port
BUILD_ALL=1 swift run whisperkit-cli serve \
    --host 0.0.0.0 \
    --port 8080

# With specific model
BUILD_ALL=1 swift run whisperkit-cli serve \
    --model base \
    --verbose

# See all options
BUILD_ALL=1 swift run whisperkit-cli serve --help

API Endpoints

The server exposes two main endpoints:

POST /v1/audio/transcriptions - Transcribe audio to text
POST /v1/audio/translations - Translate audio to English

Supported Parameters

Parameter	Description	Default
`file`	Audio file (wav, mp3, m4a, flac)	Required
`model`	Model identifier	Server default
`language`	Source language code (e.g., “en”, “es”)	Auto-detect
`prompt`	Text to guide transcription	None
`response_format`	Output format: `json`, `verbose_json`	`verbose_json`
`temperature`	Sampling temperature (0.0-1.0)	0.0
`timestamp_granularities[]`	Timing detail: `word`, `segment`	`segment`
`stream`	Enable streaming output	`false`
`include[]`	Include additional data: `logprobs`	None

Python Client

Use the OpenAI Python SDK to connect to the local server:

Installation

cd Examples/ServeCLIClient/Python
uv sync  # or: pip install openai

Quick Example

from openai import OpenAI

# Connect to local server
client = OpenAI(base_url="http://localhost:50060/v1")

# Transcribe audio file
with open("audio.wav", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        file=audio_file,
        model="tiny"  # Model parameter is required
    )

print(result.text)

Transcription with Options

from openai import OpenAI

client = OpenAI(base_url="http://localhost:50060/v1")

with open("audio.wav", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        file=audio_file,
        model="tiny",
        language="en",
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"]
    )

# Access detailed information
print(f"Language: {result.language}")
print(f"Duration: {result.duration}s")
print(f"Text: {result.text}")

# Word-level timestamps
for word in result.words:
    print(f"{word.start:.2f}s - {word.end:.2f}s: {word.word}")

# Segment-level timestamps
for segment in result.segments:
    print(f"[{segment.start:.2f}s]: {segment.text}")

Translation

# Translate audio to English
with open("spanish_audio.wav", "rb") as audio_file:
    result = client.audio.translations.create(
        file=audio_file,
        model="tiny"
    )

print(f"Translation: {result.text}")

Streaming Transcription

import requests
import json

# Use requests library for streaming
url = "http://localhost:50060/v1/audio/transcriptions"

with open("audio.wav", "rb") as audio_file:
    files = {"file": audio_file}
    data = {
        "model": "tiny",
        "stream": "true",
        "response_format": "verbose_json"
    }
    
    response = requests.post(
        url,
        files=files,
        data=data,
        headers={"Accept": "text/event-stream"},
        stream=True
    )
    
    # Process Server-Sent Events
    for line in response.iter_lines():
        if line:
            line_str = line.decode('utf-8')
            if line_str.startswith('data: '):
                data_str = line_str[6:]  # Remove 'data: ' prefix
                try:
                    event = json.loads(data_str)
                    if event.get('type') == 'transcript.text.delta':
                        print(event['delta'], end='', flush=True)
                    elif event.get('type') == 'transcript.text.done':
                        print(f"\nFinal: {event['text']}")
                except json.JSONDecodeError:
                    pass

Command Line Usage

# Using the provided Python client
cd Examples/ServeCLIClient/Python

# Transcribe
python whisperkit_client.py transcribe \
    --file audio.wav \
    --language en

# Translate
python whisperkit_client.py translate \
    --file audio.wav

# With streaming
python whisperkit_client.py transcribe \
    --file audio.wav \
    --stream

Swift Client

The Swift client is generated from the OpenAPI specification:

Installation

cd Examples/ServeCLIClient/Swift
swift build

Command Line Usage

# Transcribe
swift run whisperkit-client transcribe audio.wav --language en

# Translate
swift run whisperkit-client translate audio.wav

# With word-level timestamps
swift run whisperkit-client transcribe audio.wav \
    --timestamp-granularities word,segment \
    --response-format verbose_json

# Streaming
swift run whisperkit-client transcribe audio.wav --stream

Programmatic Usage

import Foundation
import WhisperKitSwiftClient

// Initialize client
let client = WhisperKitClient(
    serverURL: "http://localhost:50060/v1"
)

// Transcribe audio
try await client.transcribeAudio(
    filePath: "audio.wav",
    language: "en",
    model: "tiny",
    responseFormat: "verbose_json",
    timestampGranularities: "word,segment",
    stream: false
)

// Translate audio
try await client.translateAudio(
    filePath: "audio.wav",
    language: "es",
    model: "tiny",
    responseFormat: "verbose_json"
)

cURL Client

Use the provided shell scripts or raw cURL commands:

Using Shell Scripts

cd Examples/ServeCLIClient/Curl
chmod +x *.sh

# Transcribe
./transcribe.sh audio.wav --language en

# Translate
./translate.sh audio.wav --language es

# With all options
./transcribe.sh audio.wav \
    --model base \
    --language en \
    --timestamp-granularities word,segment \
    --stream true

# Run comprehensive test suite
./test.sh

Raw cURL Commands

Basic Transcription

curl -X POST http://localhost:50060/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model="tiny" \
  -F response_format="verbose_json"

With Word Timestamps

curl -X POST http://localhost:50060/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model="tiny" \
  -F response_format="verbose_json" \
  -F timestamp_granularities[]="word,segment"

Streaming Output

curl -N -X POST http://localhost:50060/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model="tiny" \
  -F stream="true" \
  -H "Accept: text/event-stream"

Translation

curl -X POST http://localhost:50060/v1/audio/translations \
  -F file=@audio.wav \
  -F model="tiny" \
  -F response_format="verbose_json"

With Log Probabilities

curl -X POST http://localhost:50060/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model="tiny" \
  -F response_format="json" \
  -F "include[]=logprobs"

JavaScript/TypeScript Client

Installation

npm install openai
# or
yarn add openai

Usage

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  baseURL: 'http://localhost:50060/v1',
  apiKey: 'dummy-key'  // Not used by local server
});

// Transcribe
const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.wav'),
  model: 'tiny',
  language: 'en',
  response_format: 'verbose_json',
  timestamp_granularities: ['word', 'segment']
});

console.log(transcription.text);

// Access word timestamps
transcription.words?.forEach(word => {
  console.log(`${word.start}s: ${word.word}`);
});

// Translate
const translation = await client.audio.translations.create({
  file: fs.createReadStream('audio.wav'),
  model: 'tiny'
});

console.log(translation.text);

Generating Custom Clients

You can generate clients for any language using the OpenAPI specification:

Get the OpenAPI Spec

# Generate the latest spec
make generate-server

# The spec is located at:
# scripts/specs/localserver_openapi.yaml

Generate Clients

Python Client

swift run swift-openapi-generator generate \
  scripts/specs/localserver_openapi.yaml \
  --output-directory python-client \
  --mode client \
  --mode types

TypeScript Client

npx @openapitools/openapi-generator-cli generate \
  -i scripts/specs/localserver_openapi.yaml \
  -g typescript-fetch \
  -o typescript-client

Go Client

openapi-generator-cli generate \
  -i scripts/specs/localserver_openapi.yaml \
  -g go \
  -o go-client

API Limitations

Compared to the official OpenAI API:

Response formats: Only json and verbose_json supported (no plain text, SRT, VTT)
Model selection: Server must be launched with desired model via --model flag

Fully Supported Features

The local server fully supports:

Log probabilities: include[]=logprobs parameter for token-level confidence
Streaming responses: Server-Sent Events (SSE) for real-time transcription
Timestamp granularities: Both word and segment level timing
Language detection: Automatic language detection or manual specification
Temperature control: Sampling temperature for transcription randomness
Prompt text: Text guidance for transcription style and context

Server Configuration

Environment Variables

# Set custom model cache directory
export WHISPERKIT_CACHE_DIR="/path/to/models"

# Enable debug logging
BUILD_ALL=1 swift run whisperkit-cli serve --verbose

Model Management

# Download a model before starting server
make download-model MODEL=base

# Start server with downloaded model
BUILD_ALL=1 swift run whisperkit-cli serve \
    --model-path "Models/whisperkit-coreml/openai_whisper-base"

Docker Deployment

Create a Dockerfile:

FROM swift:5.9

# Install dependencies
RUN apt-get update && apt-get install -y \
    git \
    git-lfs

# Clone and build WhisperKit
WORKDIR /app
RUN git clone https://github.com/argmaxinc/whisperkit.git
WORKDIR /app/whisperkit

RUN make setup
RUN make download-model MODEL=tiny
RUN BUILD_ALL=1 swift build --product whisperkit-cli -c release

EXPOSE 50060

CMD ["BUILD_ALL=1", "swift", "run", "-c", "release", "whisperkit-cli", "serve", "--host", "0.0.0.0"]

Build and run:

# Build image
docker build -t whisperkit-server .

# Run container
docker run -p 50060:50060 whisperkit-server

Get Started

WhisperKit (Speech-to-Text)

TTSKit (Text-to-Speech)

Advanced

Examples

Documentation Index

​Overview

​Building the Server

​Starting the Server

​Default Configuration

​Custom Configuration

​API Endpoints

​Supported Parameters

​Python Client

​Installation

​Quick Example

​Transcription with Options

​Translation

​Streaming Transcription

​Command Line Usage

​Swift Client

​Installation

​Command Line Usage

​Programmatic Usage

​cURL Client

​Using Shell Scripts

​Raw cURL Commands

​Basic Transcription

​With Word Timestamps

​Streaming Output

​Translation

​With Log Probabilities

​JavaScript/TypeScript Client

​Installation

​Usage

​Generating Custom Clients

​Get the OpenAPI Spec

​Generate Clients

​Python Client

​TypeScript Client

​Go Client

​API Limitations

​Fully Supported Features

​Server Configuration

​Environment Variables

​Model Management

​Docker Deployment

​Next Steps

Basic Transcription

Real-Time Streaming

Build docs developers (and LLMs) love

Overview

Building the Server

Starting the Server

Default Configuration

Custom Configuration

API Endpoints

Supported Parameters

Python Client

Installation

Quick Example

Transcription with Options

Translation

Streaming Transcription

Command Line Usage

Swift Client

Installation

Command Line Usage

Programmatic Usage

cURL Client

Using Shell Scripts

Raw cURL Commands

Basic Transcription

With Word Timestamps

Streaming Output

Translation

With Log Probabilities

JavaScript/TypeScript Client

Installation

Usage

Generating Custom Clients

Get the OpenAPI Spec

Generate Clients

Python Client

TypeScript Client

Go Client

API Limitations

Fully Supported Features

Server Configuration

Environment Variables

Model Management

Docker Deployment

Next Steps