For real-time transcription server with full-duplex streaming capabilities, check out WhisperKit Pro Local Server which provides live audio streaming and real-time transcription for applications requiring continuous audio processing.
Building the Server
The local server requires a special build flag to include server dependencies:Starting the Server
Basic Usage
Start the server with default settings (localhost:50060):Configuration Options
Host address to bind the server to. Use
0.0.0.0 to accept connections from any network interface.Port number for the server to listen on.
Specific model to use (e.g.,
tiny, base, small, medium, large-v3).Path to local model files if you don’t want to download them.
Enable verbose logging for debugging.
Examples
API Endpoints
The server implements the OpenAI Audio API specification:POST /v1/audio/transcriptions
Transcribe audio to text in the original language. Request:Audio file to transcribe (wav, mp3, m4a, flac)
Model identifier (required by API spec, uses server’s loaded model)
Source language code (e.g.,
en, es, ja). Auto-detects if not specified.Text to guide transcription style and context
Output format:
json or verbose_jsonSampling temperature (0.0-1.0)
Timing detail:
word, segment, or bothEnable Server-Sent Events (SSE) streaming
POST /v1/audio/translations
Translate audio to English text. Accepts the same parameters as/v1/audio/transcriptions
GET /health
Health check endpoint that returns server status.Client Examples
Python Client
Using the OpenAI Python SDK:Command Line with curl
Swift Client
Generate a Swift client from the OpenAPI specification:Client Generation
You can generate clients for any language using the OpenAPI specification:Supported Features
Streaming
Server-Sent Events (SSE) for real-time transcription results
Timestamps
Word-level and segment-level timing information
Log Probabilities
Token-level confidence scores via
logprobs parameterLanguage Detection
Automatic language detection or manual specification
Temperature Control
Sampling temperature for transcription randomness
Prompt Text
Text guidance for transcription style and context
API Limitations
Compared to the official OpenAI API:Example Projects
Explore complete example implementations:Python Client
OpenAI SDK-based Python client
Swift Client
Generated from OpenAPI spec
Curl Scripts
Lightweight shell script examples
Troubleshooting
Server won't start - BUILD_ALL=1 required
Server won't start - BUILD_ALL=1 required
The server requires special build flags. Always use:Or build once with
make build-local-server then run normally.Connection refused errors
Connection refused errors
- Check the server is running:
curl http://localhost:50060/health - Verify the port isn’t in use:
lsof -i :50060 - Try binding to all interfaces:
--host 0.0.0.0
Model not loading
Model not loading
- Ensure model files are downloaded:
make download-model MODEL=tiny - Check model path is correct:
--model-path Models/whisperkit-coreml/openai_whisper-tiny - Try verbose mode:
--verbose
Slow transcription performance
Slow transcription performance
- Use smaller models for faster inference (tiny, base, small)
- Check compute units configuration (see Performance Optimization)
- Ensure audio encoder uses Neural Engine on macOS 14+
Next Steps
CLI Usage
Learn about command-line transcription
Performance Optimization
Optimize transcription speed and quality