Local Server

WhisperKit includes a local server that implements the OpenAI Audio API, allowing you to use existing OpenAI SDK clients or generate new ones. The server supports transcription and translation with output streaming capabilities for real-time transcription results.

For real-time transcription server with full-duplex streaming capabilities, check out WhisperKit Pro Local Server which provides live audio streaming and real-time transcription for applications requiring continuous audio processing.

Building the Server

The local server requires a special build flag to include server dependencies:

make build-local-server

Starting the Server

Basic Usage

Start the server with default settings (localhost:50060):

BUILD_ALL=1 swift run whisperkit-cli serve

Configuration Options

--host

string

default:"localhost"

Host address to bind the server to. Use 0.0.0.0 to accept connections from any network interface.

--port

int

default:"50060"

Port number for the server to listen on.

--model

string

Specific model to use (e.g., tiny, base, small, medium, large-v3).

--model-path

string

Path to local model files if you don’t want to download them.

--verbose

boolean

Enable verbose logging for debugging.

Examples

# Start with default tiny model on localhost:50060
BUILD_ALL=1 swift run whisperkit-cli serve

API Endpoints

The server implements the OpenAI Audio API specification:

POST /v1/audio/transcriptions

Transcribe audio to text in the original language. Request:

file

required

Audio file to transcribe (wav, mp3, m4a, flac)

model

string

required

Model identifier (required by API spec, uses server’s loaded model)

language

string

Source language code (e.g., en, es, ja). Auto-detects if not specified.

prompt

string

Text to guide transcription style and context

response_format

string

default:"verbose_json"

Output format: json or verbose_json

temperature

float

default:"0.0"

Sampling temperature (0.0-1.0)

timestamp_granularities[]

array

default:"[segment]"

Timing detail: word, segment, or both

stream

boolean

default:"false"

Enable Server-Sent Events (SSE) streaming

POST /v1/audio/translations

Translate audio to English text. Accepts the same parameters as /v1/audio/transcriptions

GET /health

Health check endpoint that returns server status.

Client Examples

Python Client

Using the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:50060/v1")

with open("audio.wav", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        file=audio_file,
        model="tiny",  # Required parameter
        language="en"
    )
    print(result.text)

Command Line with curl

curl -X POST http://localhost:50060/v1/audio/transcriptions \
  -F "file=@audio.wav" \
  -F "model=tiny" \
  -F "language=en"

Swift Client

Generate a Swift client from the OpenAPI specification:

cd Examples/ServeCLIClient/Swift
swift run whisperkit-client transcribe audio.wav --language en
swift run whisperkit-client translate audio.wav

See the Swift client README for more details.

Client Generation

You can generate clients for any language using the OpenAPI specification:

swift run swift-openapi-generator generate scripts/specs/localserver_openapi.yaml \
  --output-directory python-client \
  --mode client \
  --mode types

To regenerate the OpenAPI specification from the latest OpenAI API:

make generate-server

Supported Features

Streaming

Server-Sent Events (SSE) for real-time transcription results

Timestamps

Word-level and segment-level timing information

Log Probabilities

Token-level confidence scores via logprobs parameter

Language Detection

Automatic language detection or manual specification

Temperature Control

Sampling temperature for transcription randomness

Prompt Text

Text guidance for transcription style and context

API Limitations

Compared to the official OpenAI API:

Response formats: Only json and verbose_json supported (no plain text, SRT, VTT formats)
Model selection: Server must be launched with desired model via --model flag. The model parameter in API requests is required but uses the server’s loaded model.

Example Projects

Explore complete example implementations:

Python Client

OpenAI SDK-based Python client

Swift Client

Generated from OpenAPI spec

Curl Scripts

Lightweight shell script examples

Troubleshooting

Server won't start - BUILD_ALL=1 required

The server requires special build flags. Always use:

BUILD_ALL=1 swift run whisperkit-cli serve

Or build once with make build-local-server then run normally.

Connection refused errors

Check the server is running: curl http://localhost:50060/health
Verify the port isn’t in use: lsof -i :50060
Try binding to all interfaces: --host 0.0.0.0

Model not loading

Ensure model files are downloaded: make download-model MODEL=tiny
Check model path is correct: --model-path Models/whisperkit-coreml/openai_whisper-tiny
Try verbose mode: --verbose

Slow transcription performance

Use smaller models for faster inference (tiny, base, small)
Check compute units configuration (see Performance Optimization)
Ensure audio encoder uses Neural Engine on macOS 14+

Next Steps

CLI Usage

Learn about command-line transcription

Performance Optimization

Optimize transcription speed and quality

Get Started

WhisperKit (Speech-to-Text)

TTSKit (Text-to-Speech)

Advanced

Examples

Building the Server

Starting the Server

Basic Usage

Configuration Options

Examples

API Endpoints

POST /v1/audio/transcriptions

POST /v1/audio/translations

GET /health

Client Examples

Python Client

Command Line with curl

Swift Client

Client Generation

Supported Features

Streaming

Timestamps

Log Probabilities

Language Detection

Temperature Control

Prompt Text

API Limitations

Example Projects

Python Client

Swift Client

Curl Scripts

Troubleshooting

Next Steps

CLI Usage

Performance Optimization

Build docs developers (and LLMs) love

Get Started

WhisperKit (Speech-to-Text)

TTSKit (Text-to-Speech)

Advanced

Examples

Documentation Index

​Building the Server

​Starting the Server

​Basic Usage

​Configuration Options

​Examples

​API Endpoints

​POST /v1/audio/transcriptions

​POST /v1/audio/translations

​GET /health

​Client Examples

​Python Client

​Command Line with curl

​Swift Client

​Client Generation

​Supported Features

Streaming

Timestamps

Log Probabilities

Language Detection

Temperature Control

Prompt Text

​API Limitations

​Example Projects

Python Client

Swift Client

Curl Scripts

​Troubleshooting

​Next Steps

CLI Usage

Performance Optimization

Build docs developers (and LLMs) love

Building the Server

Starting the Server

Basic Usage

Configuration Options

Examples

API Endpoints

POST /v1/audio/transcriptions

POST /v1/audio/translations

GET /health

Client Examples

Python Client

Command Line with curl

Swift Client

Client Generation

Supported Features

API Limitations

Example Projects

Troubleshooting

Next Steps