Skip to main content

Overview

Prism provides a unified API for audio processing with two main capabilities:
  • Text-to-Speech (TTS): Convert text into spoken audio
  • Speech-to-Text (STT): Transcribe audio files into text
Use Prism::audio() to access both features with a simple, fluent interface.
Check the provider documentation to see which providers support audio processing and what models are available.

Text-to-Speech

Convert text into natural-sounding speech audio.

Basic Usage

use Prism\Prism\Facades\Prism;

$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('Hello, welcome to Prism!')
    ->withVoice('alloy')
    ->asAudio();

$audio = $response->audio;

Available Voices

Different providers offer different voice options:
// OpenAI TTS voices: alloy, echo, fable, onyx, nova, shimmer
$response = Prism::audio()
    ->using('openai', 'tts-1-hd')
    ->withInput('This is a test of different voices')
    ->withVoice('nova')
    ->asAudio();

Configuring TTS Options

Use provider-specific options for advanced control:
$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('Welcome to our application!')
    ->withVoice('alloy')
    ->withProviderOptions([
        'speed' => 1.0,        // Speech speed (0.25 to 4.0)
        'response_format' => 'mp3'  // Audio format
    ])
    ->asAudio();

Saving Audio Output

Save generated audio to a file:
$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('This will be saved as audio')
    ->withVoice('echo')
    ->asAudio();

$audio = $response->audio;

if ($audio->base64) {
    $audioContent = base64_decode($audio->base64);
    file_put_contents('/path/to/output.mp3', $audioContent);
}

Speech-to-Text

Transcribe audio files into text with high accuracy.

Basic Usage

use Prism\Prism\Facades\Prism;
use Prism\Prism\ValueObjects\Media\Audio;

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
    ->asText();

echo $response->text;

Loading Audio Files

Prism supports multiple ways to load audio files:
use Prism\Prism\ValueObjects\Media\Audio;

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/recording.mp3'))
    ->asText();

Transcription Options

Configure transcription behavior with provider options:
use Prism\Prism\ValueObjects\Media\Audio;

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
    ->withProviderOptions([
        'language' => 'en',           // Language code (optional)
        'prompt' => 'Custom context', // Context for better accuracy
        'temperature' => 0.0          // Sampling temperature
    ])
    ->asText();

echo $response->text;

Supported Audio Formats

Common audio formats supported include:
  • MP3 (audio/mpeg)
  • MP4 (audio/mp4)
  • WAV (audio/wav, audio/x-wav)
  • WebM (audio/webm)
  • FLAC (audio/flac)
  • AAC (audio/aac)
  • OGG (audio/ogg)
Format support varies by provider. Check your provider’s documentation for the complete list of supported formats.

Response Objects

Audio Response (TTS)

The AudioResponse object contains the generated audio:
$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('Hello world')
    ->withVoice('alloy')
    ->asAudio();

// Access the audio object
$audio = $response->audio;

// Base64-encoded audio data
$base64Audio = $audio->base64;

// Audio type/format
$audioType = $audio->type;

// Convert to array
$array = $response->toArray();

Text Response (STT)

The TextResponse object contains the transcribed text:
use Prism\Prism\ValueObjects\Media\Audio;

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
    ->asText();

// Transcribed text
$transcription = $response->text;

// Usage information (if provided by the provider)
if ($response->usage) {
    echo "Duration: {$response->usage->audioDuration} seconds";
}

// Additional provider data
$additionalData = $response->additionalContent;

// Convert to array
$array = $response->toArray();

Provider-Specific Options

Different providers offer unique features and options:
// Text-to-Speech with OpenAI
$response = Prism::audio()
    ->using('openai', 'tts-1-hd')
    ->withInput('High quality audio')
    ->withVoice('nova')
    ->withProviderOptions([
        'speed' => 1.25,
        'response_format' => 'opus'
    ])
    ->asAudio();

// Speech-to-Text with OpenAI Whisper
$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
    ->withProviderOptions([
        'language' => 'en',
        'prompt' => 'Technical discussion about AI',
        'temperature' => 0.2
    ])
    ->asText();

Client Configuration

HTTP Options

Configure HTTP client behavior for audio requests:
$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/large-file.mp3'))
    ->withClientOptions([
        'timeout' => 300,          // 5 minutes for large files
        'connect_timeout' => 30
    ])
    ->asText();

Retry Configuration

Configure automatic retries for transient failures:
$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('Retry if needed')
    ->withVoice('alloy')
    ->withClientRetry(
        times: 3,
        sleepMilliseconds: 1000
    )
    ->asAudio();

Error Handling

Handle errors gracefully when processing audio:
use Illuminate\Http\Client\RequestException;
use InvalidArgumentException;

try {
    $response = Prism::audio()
        ->using('openai', 'whisper-1')
        ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
        ->asText();
    
    echo "Transcription: {$response->text}";
} catch (InvalidArgumentException $e) {
    // Handle invalid input (e.g., wrong media type)
    echo "Invalid input: {$e->getMessage()}";
} catch (RequestException $e) {
    // Handle API errors
    echo "API error: {$e->getMessage()}";
}

Common Use Cases

// Process user voice input and generate audio response
use Prism\Prism\ValueObjects\Media\Audio;

// 1. Transcribe user's voice command
$transcription = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/tmp/user-voice.mp3'))
    ->asText();

// 2. Process the command (your business logic here)
$responseText = processCommand($transcription->text);

// 3. Generate audio response
$audioResponse = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput($responseText)
    ->withVoice('nova')
    ->asAudio();

Best Practices

  1. Choose the right model: Use tts-1 for speed or tts-1-hd for quality in OpenAI
  2. Validate input types: Ensure you use string input for TTS and Audio objects for STT
  3. Handle large files: Increase timeouts for processing long audio files
  4. Specify MIME types: Always provide MIME types when using base64 or raw content
  5. Optimize costs: Be aware that audio processing can be expensive for long files

Next Steps

Image Generation

Generate images from text descriptions

Embeddings

Generate vector embeddings for text and images

Input Modalities

Learn about using audio in text generation

Providers

Explore provider-specific audio features

Build docs developers (and LLMs) love