Chat completions

Create chat completion

model

string | list | DedalusModel

required

Model identifier. Accepts model ID strings, lists for routing, or DedalusModel objects with per-model settings.

messages

array

Conversation history (OpenAI: messages, Google: contents, Responses: input).Each message should have:

role: “system” | “user” | “assistant” | “tool” | “function” | “developer”
content: string | array of content parts

Configuration parameters

temperature

float

Sampling temperature (0-2 for most providers). Higher values make output more random.

max_tokens

int

Maximum tokens in completion.

max_completion_tokens

int

Maximum tokens in completion (newer parameter name).

top_p

float

Nucleus sampling threshold. An alternative to sampling with temperature.

top_k

int

Top-k sampling parameter. Limits the number of highest probability tokens to consider.

int

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices.

seed

int

Random seed for deterministic output.

stream

boolean

Enable streaming response. Set to true to receive incremental chunks via Server-Sent Events.

stream_options

object

Options for streaming response. Only set this when you set stream: true.

System and instructions

system_instruction

string | object

System instruction/prompt. Defines the behavior and personality of the assistant.

prompt_mode

string

Allows toggling between the reasoning mode and no system prompt. When set to reasoning the system prompt for reasoning models will be used.

"reasoning": Use reasoning mode

Response formatting

response_format

object

An object specifying the format that the model must output.

{ "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensures the model will match your supplied JSON schema.
{ "type": "json_object" } enables the older JSON mode.
{ "type": "text" } for plain text (default).

Tools and function calling

tools

array

Available tools/functions for the model. Each tool should have:

type: “function”
function: Function definition with name, description, and parameters

tool_choice

string | object

Controls which (if any) tool is called by the model.

"none": Model will not call any tool
"auto": Model can pick between generating a message or calling tools (default if tools are present)
"required": Model must call one or more tools
{"type": "function", "function": {"name": "my_function"}}: Forces the model to call that specific tool

parallel_tool_calls

boolean

Whether to enable parallel tool calls. Allows the model to call multiple tools simultaneously.

automatic_tool_execution

boolean

Execute tools server-side. If false, returns raw tool calls for manual handling.

tool_config

object

Tool calling configuration (Google-specific).

functions

array

Deprecated in favor of tools. A list of functions the model may generate JSON inputs for.

function_call

string

Deprecated in favor of tool_choice. Controls which (if any) function is called by the model.

MCP servers

mcp_servers

array | string

MCP server identifiers. Accepts marketplace slugs, URLs, or MCPServerSpec objects. MCP tools are executed server-side and billed separately.

credentials

object

Credentials for MCP server authentication. Each credential is matched to servers by connection name.

Advanced parameters

stop

string | array

Sequences that stop generation. Up to 4 sequences where the API will stop generating further tokens.

presence_penalty

float

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

frequency_penalty

float

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

logit_bias

object

Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.

logprobs

boolean

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

top_logprobs

int

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

user

string

Note: This field is being replaced by safety_identifier and prompt_cache_key.A stable identifier for your end-users. Used to boost cache hit rates and to help detect and prevent abuse.

safety_identifier

string

A stable identifier used to help detect users of your application that may be violating usage policies. We recommend hashing their username or email address.

Audio and modalities

modalities

array

Output types that you would like the model to generate. Most models are capable of generating text, which is the default: ["text"]. The gpt-4o-audio-preview model can also generate audio. To request both: ["text", "audio"].

audio

object

Parameters for audio output. Required when audio output is requested with modalities: ["audio"].Fields:

voice (required): Voice ID or custom voice
format (required): “wav” | “aac” | “mp3” | “flac” | “opus” | “pcm16”

Caching and optimization

prompt_cache_key

string

Used to cache responses for similar requests to optimize your cache hit rates. Replaces the user field.

prompt_cache_retention

string

The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps cached prefixes active for longer.

cached_content

string

Optional. The name of the cached content to use as context to serve the prediction. Format: cachedContents/{cachedContent} (Google-specific).

Reasoning and thinking

reasoning_effort

string

Constrains effort on reasoning for reasoning models. Currently supported values are none, minimal, low, medium, high, and xhigh.

gpt-5.1 defaults to none
Models before gpt-5.1 default to medium
gpt-5-pro defaults to high

thinking

object

Extended thinking configuration (Anthropic-specific).

prediction

object

Static predicted output content, such as the content of a text file that is being regenerated.Fields:

type (required): “content”
content (required): string or array of text content parts

Safety and content filtering

safe_prompt

boolean

Whether to inject a safety prompt before all conversations.

safety_settings

array

Safety/content filtering settings (Google-specific).

guardrails

array

Content filtering and safety policy configuration.

Model routing and handoffs

agent_attributes

object

Agent attributes. Values in [0.0, 1.0]. Used for model routing decisions.

model_attributes

object

Model attributes for routing. Maps model IDs to attribute dictionaries with values in [0.0, 1.0].

handoff_config

object

Configuration for multi-model handoffs.

handoff_mode

boolean

Handoff control. None or omitted: auto-detect. true: structured handoff (SDK). false: drop-in (LLM re-run for mixed turns).

correlation_id

string

Stable session ID for resuming a previous handoff. Returned by the server on handoff; echo it on the next request to resume.

deferred_calls

array

Tier 2 stateless resumption. Deferred tool specs from a previous handoff response, sent back verbatim so the server can resume without Redis.

max_turns

int

Maximum conversation turns.

Service configuration

service_tier

string

Service tier for request processing. Options: “auto” | “default” | “flex” | “scale” | “priority”.

speed

string

The inference speed mode for this request. "fast" enables high output-tokens-per-second inference.

"standard": Default speed
"fast": High-speed inference

inference_geo

string

Specifies the geographic region for inference processing. If not specified, the workspace’s default_inference_geo is used.

deferred

boolean

If set to true, the request returns a request_id. You can then get the deferred response by GET /v1/chat/deferred-completion/{request_id}.

Metadata and tracking

metadata

object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

store

boolean

Whether or not to store the output of this chat completion request for use in model distillation or evals products.

Provider-specific parameters

generation_config

object

Generation parameters wrapper (Google-specific).

output_config

object

Output configuration (provider-specific).

search_parameters

object

Set the parameters to be used for searched data. If not set, no data will be acquired by the model.

web_search_options

object

This tool searches the web for relevant results to use in a response.

verbosity

string

Constrains the verbosity of the model’s response. Currently supported values are low, medium, and high.

Request options

extra_headers

object

Send extra headers with the request.

extra_query

object

Add additional query parameters to the request.

extra_body

object

Add additional JSON properties to the request body.

timeout

float

Override the client-level default timeout for this request, in seconds.

idempotency_key

string

Specify a custom idempotency key for this request.

Response

Returns a ChatCompletion object when stream=False (default), or a Stream[ChatCompletionChunk] when stream=True.

ChatCompletion fields

string

required

A unique identifier for the chat completion.

object

string

required

The object type, which is always chat.completion.

created

int

required

The Unix timestamp (in seconds) of when the chat completion was created.

model

string

required

The model used for the chat completion.

choices

array

required

A list of chat completion choices. Can be more than one if n is greater than 1.Each choice contains:

index: The index of this choice
message: The chat completion message with role and content
finish_reason: “stop” | “length” | “tool_calls” | “content_filter” | null
logprobs: Log probability information (if requested)

usage

object

Usage statistics for the completion request.Fields:

prompt_tokens: Number of tokens in the prompt
completion_tokens: Number of tokens in the completion
total_tokens: Total tokens used
completion_tokens_details: Breakdown of completion tokens
prompt_tokens_details: Breakdown of prompt tokens

system_fingerprint

string

This fingerprint represents the backend configuration that the model runs with. Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.

service_tier

string

The processing type used for serving the request.

Dedalus-specific response fields

tools_executed

array

List of tool names that were executed server-side (e.g., MCP tools). Only present when tools were executed on the server.

mcp_tool_results

array

Detailed results of MCP tool executions including inputs, outputs, and timing. Provides full visibility into server-side tool execution for debugging and audit purposes.

mcp_server_errors

object

MCP server failures keyed by server name. Each error contains:

message: Human-readable error message
code: Machine-readable error code
recommendation: Suggested action for the user

pending_tools

array

Client tools to execute, with dependency ordering. Each pending tool contains:

id: Unique identifier for this tool call
name: Name of the tool to execute
arguments: Input arguments for the tool call
dependencies: IDs of other pending calls that must complete first

server_results

object

Completed server tool outputs keyed by call ID.

correlation_id

string

Stable session ID for cross-turn handoff state. Echo this on the next request to resume server-side execution.

deferred

array

Server tools blocked on client results.

turns_consumed

int

Number of internal LLM calls made during this request. SDKs can sum this across their outer loop to track total LLM calls.

Examples

Basic completion

from dedalus_labs import Dedalus

client = Dedalus()

completion = client.chat.completions.create(
    model="openai/gpt-5-nano",
    messages=[
        {
            "role": "system",
            "content": "You are Stephen Dedalus. Respond in morose Joycean malaise."
        },
        {
            "role": "user",
            "content": "Hello, how are you today?"
        }
    ]
)

print(completion.choices[0].message.content)

With temperature and max tokens

completion = client.chat.completions.create(
    model="openai/gpt-5-nano",
    messages=[
        {"role": "user", "content": "Write a haiku about AI"}
    ],
    temperature=0.7,
    max_tokens=100
)

With function calling

completion = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[
        {"role": "user", "content": "What's the weather in Boston?"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather in a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }
    ],
    tool_choice="auto"
)

Async usage

from dedalus_labs import AsyncDedalus

client = AsyncDedalus()

completion = await client.chat.completions.create(
    model="openai/gpt-5-nano",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

Client

Chat

Embeddings

Audio

Images

OCR

Models

Responses

Runner

Types

Create chat completion

Configuration parameters

System and instructions

Response formatting

Tools and function calling

MCP servers

Advanced parameters

Audio and modalities

Caching and optimization

Reasoning and thinking

Safety and content filtering

Model routing and handoffs

Service configuration

Metadata and tracking

Provider-specific parameters

Request options

Response

ChatCompletion fields

Dedalus-specific response fields

Examples

Basic completion

With temperature and max tokens

With function calling

Async usage

Build docs developers (and LLMs) love

Client

Chat

Embeddings

Audio

Images

OCR

Models

Responses

Runner

Types

​Create chat completion

​Configuration parameters

​System and instructions

​Response formatting

​Tools and function calling

​MCP servers

​Advanced parameters

​Audio and modalities

​Caching and optimization

​Reasoning and thinking

​Safety and content filtering

​Model routing and handoffs

​Service configuration

​Metadata and tracking

​Provider-specific parameters

​Request options

​Response

​ChatCompletion fields

​Dedalus-specific response fields

​Examples

​Basic completion

​With temperature and max tokens

​With function calling

​Async usage

Build docs developers (and LLMs) love

Create chat completion

Configuration parameters

System and instructions

Response formatting

Tools and function calling

MCP servers

Advanced parameters

Audio and modalities

Caching and optimization

Reasoning and thinking

Safety and content filtering

Model routing and handoffs

Service configuration

Metadata and tracking

Provider-specific parameters

Request options

Response

ChatCompletion fields

Dedalus-specific response fields

Examples

Basic completion

With temperature and max tokens

With function calling

Async usage