Audio Processing

Overview

Prism provides a unified API for audio processing with two main capabilities:

Text-to-Speech (TTS): Convert text into spoken audio
Speech-to-Text (STT): Transcribe audio files into text

Use Prism::audio() to access both features with a simple, fluent interface.

Check the provider documentation to see which providers support audio processing and what models are available.

Text-to-Speech

Convert text into natural-sounding speech audio.

Basic Usage

use Prism\Prism\Facades\Prism;

$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('Hello, welcome to Prism!')
    ->withVoice('alloy')
    ->asAudio();

$audio = $response->audio;

Available Voices

Different providers offer different voice options:

OpenAI
Other Providers

// OpenAI TTS voices: alloy, echo, fable, onyx, nova, shimmer
$response = Prism::audio()
    ->using('openai', 'tts-1-hd')
    ->withInput('This is a test of different voices')
    ->withVoice('nova')
    ->asAudio();

// Check your provider's documentation for available voices
$response = Prism::audio()
    ->using('your-provider', 'model-name')
    ->withInput('Text to convert')
    ->withVoice('voice-id')
    ->asAudio();

Configuring TTS Options

Use provider-specific options for advanced control:

$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('Welcome to our application!')
    ->withVoice('alloy')
    ->withProviderOptions([
        'speed' => 1.0,        // Speech speed (0.25 to 4.0)
        'response_format' => 'mp3'  // Audio format
    ])
    ->asAudio();

Saving Audio Output

Save generated audio to a file:

$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('This will be saved as audio')
    ->withVoice('echo')
    ->asAudio();

$audio = $response->audio;

if ($audio->base64) {
    $audioContent = base64_decode($audio->base64);
    file_put_contents('/path/to/output.mp3', $audioContent);
}

Speech-to-Text

Transcribe audio files into text with high accuracy.

Basic Usage

use Prism\Prism\Facades\Prism;
use Prism\Prism\ValueObjects\Media\Audio;

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
    ->asText();

echo $response->text;

Loading Audio Files

Prism supports multiple ways to load audio files:

use Prism\Prism\ValueObjects\Media\Audio;

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/recording.mp3'))
    ->asText();

Transcription Options

Configure transcription behavior with provider options:

use Prism\Prism\ValueObjects\Media\Audio;

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
    ->withProviderOptions([
        'language' => 'en',           // Language code (optional)
        'prompt' => 'Custom context', // Context for better accuracy
        'temperature' => 0.0          // Sampling temperature
    ])
    ->asText();

echo $response->text;

Supported Audio Formats

Common audio formats supported include:

MP3 (audio/mpeg)
MP4 (audio/mp4)
WAV (audio/wav, audio/x-wav)
WebM (audio/webm)
FLAC (audio/flac)
AAC (audio/aac)
OGG (audio/ogg)

Format support varies by provider. Check your provider’s documentation for the complete list of supported formats.

Response Objects

Audio Response (TTS)

The AudioResponse object contains the generated audio:

$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('Hello world')
    ->withVoice('alloy')
    ->asAudio();

// Access the audio object
$audio = $response->audio;

// Base64-encoded audio data
$base64Audio = $audio->base64;

// Audio type/format
$audioType = $audio->type;

// Convert to array
$array = $response->toArray();

Text Response (STT)

The TextResponse object contains the transcribed text:

use Prism\Prism\ValueObjects\Media\Audio;

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
    ->asText();

// Transcribed text
$transcription = $response->text;

// Usage information (if provided by the provider)
if ($response->usage) {
    echo "Duration: {$response->usage->audioDuration} seconds";
}

// Additional provider data
$additionalData = $response->additionalContent;

// Convert to array
$array = $response->toArray();

Provider-Specific Options

Different providers offer unique features and options:

OpenAI
Custom Providers

// Text-to-Speech with OpenAI
$response = Prism::audio()
    ->using('openai', 'tts-1-hd')
    ->withInput('High quality audio')
    ->withVoice('nova')
    ->withProviderOptions([
        'speed' => 1.25,
        'response_format' => 'opus'
    ])
    ->asAudio();

// Speech-to-Text with OpenAI Whisper
$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
    ->withProviderOptions([
        'language' => 'en',
        'prompt' => 'Technical discussion about AI',
        'temperature' => 0.2
    ])
    ->asText();

// Check your provider's documentation for available options
$response = Prism::audio()
    ->using('your-provider', 'model-name')
    ->withInput('Text or Audio')
    ->withProviderOptions([
        // Provider-specific options here
    ])
    ->asAudio(); // or ->asText()

Client Configuration

HTTP Options

Configure HTTP client behavior for audio requests:

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/path/to/large-file.mp3'))
    ->withClientOptions([
        'timeout' => 300,          // 5 minutes for large files
        'connect_timeout' => 30
    ])
    ->asText();

Retry Configuration

Configure automatic retries for transient failures:

$response = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput('Retry if needed')
    ->withVoice('alloy')
    ->withClientRetry(
        times: 3,
        sleepMilliseconds: 1000
    )
    ->asAudio();

Error Handling

Handle errors gracefully when processing audio:

use Illuminate\Http\Client\RequestException;
use InvalidArgumentException;

try {
    $response = Prism::audio()
        ->using('openai', 'whisper-1')
        ->withInput(Audio::fromLocalPath('/path/to/audio.mp3'))
        ->asText();
    
    echo "Transcription: {$response->text}";
} catch (InvalidArgumentException $e) {
    // Handle invalid input (e.g., wrong media type)
    echo "Invalid input: {$e->getMessage()}";
} catch (RequestException $e) {
    // Handle API errors
    echo "API error: {$e->getMessage()}";
}

Common Use Cases

Voice Assistants
Podcast Transcription
Accessibility

// Process user voice input and generate audio response
use Prism\Prism\ValueObjects\Media\Audio;

// 1. Transcribe user's voice command
$transcription = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput(Audio::fromLocalPath('/tmp/user-voice.mp3'))
    ->asText();

// 2. Process the command (your business logic here)
$responseText = processCommand($transcription->text);

// 3. Generate audio response
$audioResponse = Prism::audio()
    ->using('openai', 'tts-1')
    ->withInput($responseText)
    ->withVoice('nova')
    ->asAudio();

// Transcribe podcast episodes for searchability
use Prism\Prism\ValueObjects\Media\Audio;

$episode = Audio::fromStoragePath(
    path: 'podcasts/episode-123.mp3',
    diskName: 's3'
);

$response = Prism::audio()
    ->using('openai', 'whisper-1')
    ->withInput($episode)
    ->withProviderOptions([
        'language' => 'en',
        'temperature' => 0.0  // More deterministic
    ])
    ->asText();

// Save transcription to database
Podcast::find(123)->update([
    'transcript' => $response->text
]);

// Convert written content to audio for accessibility
$article = Article::find($id);

$response = Prism::audio()
    ->using('openai', 'tts-1-hd')
    ->withInput($article->content)
    ->withVoice('echo')
    ->withProviderOptions([
        'speed' => 0.9,  // Slightly slower for clarity
        'response_format' => 'mp3'
    ])
    ->asAudio();

// Save audio version
$audioContent = base64_decode($response->audio->base64);
Storage::disk('public')->put(
    "articles/{$id}/audio.mp3",
    $audioContent
);

Best Practices

Choose the right model: Use tts-1 for speed or tts-1-hd for quality in OpenAI
Validate input types: Ensure you use string input for TTS and Audio objects for STT
Handle large files: Increase timeouts for processing long audio files
Specify MIME types: Always provide MIME types when using base64 or raw content
Optimize costs: Be aware that audio processing can be expensive for long files

Next Steps

Image Generation

Generate images from text descriptions

Embeddings

Generate vector embeddings for text and images

Input Modalities

Learn about using audio in text generation

Providers

Explore provider-specific audio features

Getting Started

Core Concepts

Multi-Modal

Advanced

Audio Processing

Overview

Text-to-Speech

Basic Usage

Available Voices

Configuring TTS Options

Saving Audio Output

Speech-to-Text

Basic Usage

Loading Audio Files

Transcription Options

Supported Audio Formats

Response Objects

Audio Response (TTS)

Text Response (STT)

Provider-Specific Options

Client Configuration

HTTP Options

Retry Configuration

Error Handling

Common Use Cases

Best Practices

Next Steps

Image Generation

Embeddings

Input Modalities

Providers

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Multi-Modal

Advanced

​Overview

​Text-to-Speech

​Basic Usage

​Available Voices

​Configuring TTS Options

​Saving Audio Output

​Speech-to-Text

​Basic Usage

​Loading Audio Files

​Transcription Options

​Supported Audio Formats

​Response Objects

​Audio Response (TTS)

​Text Response (STT)

​Provider-Specific Options

​Client Configuration

​HTTP Options

​Retry Configuration

​Error Handling

​Common Use Cases

​Best Practices

​Next Steps

Image Generation

Embeddings

Input Modalities

Providers

Build docs developers (and LLMs) love

Overview

Text-to-Speech

Basic Usage

Available Voices

Configuring TTS Options

Saving Audio Output

Speech-to-Text

Basic Usage

Loading Audio Files

Transcription Options

Supported Audio Formats

Response Objects

Audio Response (TTS)

Text Response (STT)

Provider-Specific Options

Client Configuration

HTTP Options

Retry Configuration

Error Handling

Common Use Cases

Best Practices

Next Steps