Create Transcription

Method Signature

client.audio.transcriptions.create(
    file: FileTypes,
    model: str,
    language: Optional[str] = None,
    prompt: Optional[str] = None,
    response_format: Optional[str] = None,
    temperature: Optional[float] = None
) -> TranscriptionCreateResponse

Parameters

file

FileTypes

required

The audio file to transcribe. Supported formats include:

mp3
mp4
mpeg
mpga
m4a
wav
webm

Maximum file size is 25 MB.

model

str

required

Model ID to use for transcription (e.g., "openai/whisper-1").

language

str

The language of the input audio in ISO-639-1 format (e.g., "en", "es", "fr"). Supplying the input language improves accuracy and latency.

prompt

str

An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.

response_format

str

The format of the transcript output. Supported formats:

json (default) - Simple JSON with text
text - Plain text
srt - SubRip subtitle format
verbose_json - JSON with detailed metadata
vtt - Web Video Text Tracks format

temperature

float

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Response

The response varies based on the response_format parameter:

JSON Format (default)

text

str

The transcribed text.

logprobs

List[LogprobItem]

The log probabilities of the tokens in the transcription. Only returned with models gpt-4o-transcribe and gpt-4o-mini-transcribe if logprobs is added to the include array.

Show LogprobItem properties

token

str

The token in the transcription.

logprob

float

The log probability of the token.

bytes

List[float]

The bytes of the token.

usage

Usage

Token usage statistics for the request.

Verbose JSON Format

text

str

The transcribed text.

language

str

The language of the input audio.

duration

float

The duration of the input audio.

words

List[Word]

Extracted words and their corresponding timestamps.

Show Word properties

word

str

The text content of the word.

start

float

Start time of the word in seconds.

end

float

End time of the word in seconds.

segments

List[Segment]

Segments of the transcribed text and their corresponding details.

Show Segment properties

int

Unique identifier of the segment.

text

str

Text content of the segment.

start

float

Start time of the segment in seconds.

end

float

End time of the segment in seconds.

tokens

List[int]

Array of token IDs for the text content.

temperature

float

Temperature parameter used for generating the segment.

avg_logprob

float

Average logprob of the segment. If the value is lower than -1, consider the logprobs failed.

compression_ratio

float

Compression ratio of the segment. If the value is greater than 2.4, consider the compression failed.

no_speech_prob

float

Probability of no speech in the segment. If the value is higher than 1.0 and the avg_logprob is below -1, consider this segment silent.

usage

Usage

Usage statistics for models billed by audio input duration.

Examples

from dedalus_labs import DedalusLabs

client = DedalusLabs()

# Basic transcription
with open("audio.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1"
    )

print(transcription.text)

# Transcription with language hint for better accuracy
with open("spanish_audio.wav", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1",
        language="es"
    )

print(transcription.text)

# Get detailed transcription with timestamps
with open("interview.m4a", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1",
        response_format="verbose_json"
    )

print(f"Language: {transcription.language}")
print(f"Duration: {transcription.duration}s")

for word in transcription.words:
    print(f"{word.start:.2f}s - {word.end:.2f}s: {word.word}")

# Generate SRT subtitle file
with open("video_audio.mp3", "rb") as audio_file:
    srt_output = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1",
        response_format="srt"
    )

with open("subtitles.srt", "w") as f:
    f.write(srt_output.text)

# Use prompt to guide transcription style
with open("meeting.wav", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-1",
        prompt="This is a business meeting discussing Q4 results and strategy.",
        temperature=0.2
    )

print(transcription.text)

Client

Chat

Embeddings

Audio

Images

OCR

Models

Responses

Runner

Types

Method Signature

Parameters

Response

JSON Format (default)

Verbose JSON Format

Examples

Build docs developers (and LLMs) love

Client

Chat

Embeddings

Audio

Images

OCR

Models

Responses

Runner

Types

​Method Signature

​Parameters

​Response

​JSON Format (default)

​Verbose JSON Format

​Examples

Build docs developers (and LLMs) love

Method Signature

Parameters

Response

JSON Format (default)

Verbose JSON Format

Examples