Skip to main content

Method Signature

client.audio.transcriptions.create(
    file: FileTypes,
    model: str,
    language: Optional[str] = None,
    prompt: Optional[str] = None,
    response_format: Optional[str] = None,
    temperature: Optional[float] = None
) -> TranscriptionCreateResponse

Parameters

file
FileTypes
required
The audio file to transcribe. Supported formats include:
  • mp3
  • mp4
  • mpeg
  • mpga
  • m4a
  • wav
  • webm
Maximum file size is 25 MB.
model
str
required
Model ID to use for transcription (e.g., "openai/whisper-1").
language
str
The language of the input audio in ISO-639-1 format (e.g., "en", "es", "fr"). Supplying the input language improves accuracy and latency.
prompt
str
An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.
response_format
str
The format of the transcript output. Supported formats:
  • json (default) - Simple JSON with text
  • text - Plain text
  • srt - SubRip subtitle format
  • verbose_json - JSON with detailed metadata
  • vtt - Web Video Text Tracks format
temperature
float
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Response

The response varies based on the response_format parameter:

JSON Format (default)

text
str
The transcribed text.
logprobs
List[LogprobItem]
The log probabilities of the tokens in the transcription. Only returned with models gpt-4o-transcribe and gpt-4o-mini-transcribe if logprobs is added to the include array.
usage
Usage
Token usage statistics for the request.

Verbose JSON Format

text
str
The transcribed text.
language
str
The language of the input audio.
duration
float
The duration of the input audio.
words
List[Word]
Extracted words and their corresponding timestamps.
segments
List[Segment]
Segments of the transcribed text and their corresponding details.
usage
Usage
Usage statistics for models billed by audio input duration.

Examples

from dedalus_labs import DedalusLabs

client = DedalusLabs()

# Basic transcription
with open("audio.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1"
    )

print(transcription.text)
# Transcription with language hint for better accuracy
with open("spanish_audio.wav", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1",
        language="es"
    )

print(transcription.text)
# Get detailed transcription with timestamps
with open("interview.m4a", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1",
        response_format="verbose_json"
    )

print(f"Language: {transcription.language}")
print(f"Duration: {transcription.duration}s")

for word in transcription.words:
    print(f"{word.start:.2f}s - {word.end:.2f}s: {word.word}")
# Generate SRT subtitle file
with open("video_audio.mp3", "rb") as audio_file:
    srt_output = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1",
        response_format="srt"
    )

with open("subtitles.srt", "w") as f:
    f.write(srt_output.text)
# Use prompt to guide transcription style
with open("meeting.wav", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1",
        prompt="This is a business meeting discussing Q4 results and strategy.",
        temperature=0.2
    )

print(transcription.text)

Build docs developers (and LLMs) love