Skip to main content
WhisperKit includes a local server that implements the OpenAI Audio API, allowing you to use existing OpenAI SDK clients or generate new ones. The server supports transcription and translation with output streaming capabilities for real-time transcription results.
For real-time transcription server with full-duplex streaming capabilities, check out WhisperKit Pro Local Server which provides live audio streaming and real-time transcription for applications requiring continuous audio processing.

Building the Server

The local server requires a special build flag to include server dependencies:
make build-local-server

Starting the Server

Basic Usage

Start the server with default settings (localhost:50060):
BUILD_ALL=1 swift run whisperkit-cli serve

Configuration Options

--host
string
default:"localhost"
Host address to bind the server to. Use 0.0.0.0 to accept connections from any network interface.
--port
int
default:"50060"
Port number for the server to listen on.
--model
string
Specific model to use (e.g., tiny, base, small, medium, large-v3).
--model-path
string
Path to local model files if you don’t want to download them.
--verbose
boolean
Enable verbose logging for debugging.

Examples

# Start with default tiny model on localhost:50060
BUILD_ALL=1 swift run whisperkit-cli serve

API Endpoints

The server implements the OpenAI Audio API specification:

POST /v1/audio/transcriptions

Transcribe audio to text in the original language. Request:
file
file
required
Audio file to transcribe (wav, mp3, m4a, flac)
model
string
required
Model identifier (required by API spec, uses server’s loaded model)
language
string
Source language code (e.g., en, es, ja). Auto-detects if not specified.
prompt
string
Text to guide transcription style and context
response_format
string
default:"verbose_json"
Output format: json or verbose_json
temperature
float
default:"0.0"
Sampling temperature (0.0-1.0)
timestamp_granularities[]
array
default:"[segment]"
Timing detail: word, segment, or both
stream
boolean
default:"false"
Enable Server-Sent Events (SSE) streaming

POST /v1/audio/translations

Translate audio to English text. Accepts the same parameters as /v1/audio/transcriptions

GET /health

Health check endpoint that returns server status.

Client Examples

Python Client

Using the OpenAI Python SDK:
from openai import OpenAI

client = OpenAI(base_url="http://localhost:50060/v1")

with open("audio.wav", "rb") as audio_file:
    result = client.audio.transcriptions.create(
        file=audio_file,
        model="tiny",  # Required parameter
        language="en"
    )
    print(result.text)

Command Line with curl

curl -X POST http://localhost:50060/v1/audio/transcriptions \
  -F "[email protected]" \
  -F "model=tiny" \
  -F "language=en"

Swift Client

Generate a Swift client from the OpenAPI specification:
cd Examples/ServeCLIClient/Swift
swift run whisperkit-client transcribe audio.wav --language en
swift run whisperkit-client translate audio.wav
See the Swift client README for more details.

Client Generation

You can generate clients for any language using the OpenAPI specification:
swift run swift-openapi-generator generate scripts/specs/localserver_openapi.yaml \
  --output-directory python-client \
  --mode client \
  --mode types
To regenerate the OpenAPI specification from the latest OpenAI API:
make generate-server

Supported Features

Streaming

Server-Sent Events (SSE) for real-time transcription results

Timestamps

Word-level and segment-level timing information

Log Probabilities

Token-level confidence scores via logprobs parameter

Language Detection

Automatic language detection or manual specification

Temperature Control

Sampling temperature for transcription randomness

Prompt Text

Text guidance for transcription style and context

API Limitations

Compared to the official OpenAI API:
  • Response formats: Only json and verbose_json supported (no plain text, SRT, VTT formats)
  • Model selection: Server must be launched with desired model via --model flag. The model parameter in API requests is required but uses the server’s loaded model.

Example Projects

Explore complete example implementations:

Python Client

OpenAI SDK-based Python client

Swift Client

Generated from OpenAPI spec

Curl Scripts

Lightweight shell script examples

Troubleshooting

The server requires special build flags. Always use:
BUILD_ALL=1 swift run whisperkit-cli serve
Or build once with make build-local-server then run normally.
  • Check the server is running: curl http://localhost:50060/health
  • Verify the port isn’t in use: lsof -i :50060
  • Try binding to all interfaces: --host 0.0.0.0
  • Ensure model files are downloaded: make download-model MODEL=tiny
  • Check model path is correct: --model-path Models/whisperkit-coreml/openai_whisper-tiny
  • Try verbose mode: --verbose
  • Use smaller models for faster inference (tiny, base, small)
  • Check compute units configuration (see Performance Optimization)
  • Ensure audio encoder uses Neural Engine on macOS 14+

Next Steps

CLI Usage

Learn about command-line transcription

Performance Optimization

Optimize transcription speed and quality

Build docs developers (and LLMs) love