Prerequisites
- An OpenAI account
- Access to the Realtime API (gpt-4o-realtime models)
- Payment method configured for API usage
Getting Your API Key
Sign in to OpenAI
Go to platform.openai.com and sign in to your account.
Create a new key
Click Create new secret key, give it a descriptive name (e.g., “Highway Production”), and copy the key.
Realtime API Access
The Realtime API provides low-latency, multimodal conversational experiences.Checking Access
Verify model access
Check your account’s model access in the OpenAI dashboard under Settings > Limits.
The Realtime API is currently in beta. Pricing and availability may change as it moves to general availability.
Model Configuration
Highway uses thegpt-4o-realtime-preview-2024-10-01 model for real-time voice interactions.
WebSocket Connection
The backend connects to OpenAI’s Realtime API via WebSocket:websocket.js
Key Parameters
- Endpoint:
wss://api.openai.com/v1/realtime - Model:
gpt-4o-realtime-preview-2024-10-01 - Header:
OpenAI-Beta: realtime=v1(required for beta access)
Voice Selection
Highway uses the shimmer voice for a natural, friendly tone:config.js
Available Voices
OpenAI’s Realtime API supports multiple voice options:- alloy: Neutral and balanced
- echo: Warm and upbeat
- shimmer: Gentle and friendly (Highway default)
- nova: Energetic and engaging
You can change the voice by updating the
VOICE constant in config.js. Test different voices to find the best fit for your use case.Audio Format Configuration
Highway uses g711_ulaw audio format for compatibility with Twilio’s phone network:conversationConfig.js
Why g711_ulaw?
- Telephony standard: Widely used in phone systems
- Low bandwidth: Efficient for real-time streaming
- Twilio compatibility: Native support for Twilio Media Streams
- Low latency: Minimal processing overhead
If you’re not using Twilio, you can choose other formats like
pcm16 or g711_alaw depending on your use case.Session Configuration
The complete session configuration for OpenAI:conversationConfig.js
Configuration Parameters
| Parameter | Value | Description |
|---|---|---|
turn_detection.type | server_vad | Voice activity detection on server side |
turn_detection.threshold | 0.95 | High threshold for detecting speech |
temperature | 0.6 | Moderate randomness for natural responses |
modalities | ["text", "audio"] | Enable both text and audio processing |
Rate Limits and Pricing
Rate Limits
The Realtime API has different rate limits than standard API endpoints:- Requests per minute: Varies by tier
- Tokens per minute: Varies by tier
- Concurrent connections: Limited based on account
Pricing Considerations
As of the current beta:- Input audio: Charged per token
- Output audio: Charged per token
- Function calls: Included in token count
Cost Optimization Tips
- Implement timeouts: Limit call duration to prevent runaway costs
- Use turn detection: Efficient VAD reduces unnecessary processing
- Monitor logs: Track token usage per call
- Set up alerts: Configure billing alerts in OpenAI dashboard
Testing the Connection
System Instructions
Customize the AI’s behavior by modifying the system message:config.js
Tailor the system instructions to match your brand voice and specific verification requirements. Clear instructions lead to more consistent AI behavior.
Function Tools
Highway configures custom function tools for call control:Troubleshooting
Common Issues
401 Unauthorized- Verify your API key is correct
- Check that the key hasn’t been revoked
- Ensure the
Authorizationheader is properly formatted
- Confirm you have Realtime API access
- Check the
OpenAI-Betaheader is included - Verify network allows WebSocket connections
- Use server-side VAD for faster turn detection
- Optimize audio format (g711_ulaw is efficient)
- Check network connection quality
- Verify audio format matches on both input and output
- Check that base64 encoding is correct
- Monitor for dropped WebSocket messages
Next Steps
- Set up Supabase for storing call data
- Configure Twilio for phone integration
- Learn about Conversation Configuration