Skip to main content

Create chat completion

model
string | list | DedalusModel
required
Model identifier. Accepts model ID strings, lists for routing, or DedalusModel objects with per-model settings.
messages
array
Conversation history (OpenAI: messages, Google: contents, Responses: input).Each message should have:
  • role: “system” | “user” | “assistant” | “tool” | “function” | “developer”
  • content: string | array of content parts

Configuration parameters

temperature
float
Sampling temperature (0-2 for most providers). Higher values make output more random.
max_tokens
int
Maximum tokens in completion.
max_completion_tokens
int
Maximum tokens in completion (newer parameter name).
top_p
float
Nucleus sampling threshold. An alternative to sampling with temperature.
top_k
int
Top-k sampling parameter. Limits the number of highest probability tokens to consider.
n
int
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices.
seed
int
Random seed for deterministic output.
stream
boolean
Enable streaming response. Set to true to receive incremental chunks via Server-Sent Events.
stream_options
object
Options for streaming response. Only set this when you set stream: true.

System and instructions

system_instruction
string | object
System instruction/prompt. Defines the behavior and personality of the assistant.
prompt_mode
string
Allows toggling between the reasoning mode and no system prompt. When set to reasoning the system prompt for reasoning models will be used.
  • "reasoning": Use reasoning mode

Response formatting

response_format
object
An object specifying the format that the model must output.
  • { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensures the model will match your supplied JSON schema.
  • { "type": "json_object" } enables the older JSON mode.
  • { "type": "text" } for plain text (default).

Tools and function calling

tools
array
Available tools/functions for the model. Each tool should have:
  • type: “function”
  • function: Function definition with name, description, and parameters
tool_choice
string | object
Controls which (if any) tool is called by the model.
  • "none": Model will not call any tool
  • "auto": Model can pick between generating a message or calling tools (default if tools are present)
  • "required": Model must call one or more tools
  • {"type": "function", "function": {"name": "my_function"}}: Forces the model to call that specific tool
parallel_tool_calls
boolean
Whether to enable parallel tool calls. Allows the model to call multiple tools simultaneously.
automatic_tool_execution
boolean
Execute tools server-side. If false, returns raw tool calls for manual handling.
tool_config
object
Tool calling configuration (Google-specific).
functions
array
Deprecated in favor of tools. A list of functions the model may generate JSON inputs for.
function_call
string
Deprecated in favor of tool_choice. Controls which (if any) function is called by the model.

MCP servers

mcp_servers
array | string
MCP server identifiers. Accepts marketplace slugs, URLs, or MCPServerSpec objects. MCP tools are executed server-side and billed separately.
credentials
object
Credentials for MCP server authentication. Each credential is matched to servers by connection name.

Advanced parameters

stop
string | array
Sequences that stop generation. Up to 4 sequences where the API will stop generating further tokens.
presence_penalty
float
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
frequency_penalty
float
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
logit_bias
object
Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
logprobs
boolean
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
top_logprobs
int
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
user
string
Note: This field is being replaced by safety_identifier and prompt_cache_key.A stable identifier for your end-users. Used to boost cache hit rates and to help detect and prevent abuse.
safety_identifier
string
A stable identifier used to help detect users of your application that may be violating usage policies. We recommend hashing their username or email address.

Audio and modalities

modalities
array
Output types that you would like the model to generate. Most models are capable of generating text, which is the default: ["text"]. The gpt-4o-audio-preview model can also generate audio. To request both: ["text", "audio"].
audio
object
Parameters for audio output. Required when audio output is requested with modalities: ["audio"].Fields:
  • voice (required): Voice ID or custom voice
  • format (required): “wav” | “aac” | “mp3” | “flac” | “opus” | “pcm16”

Caching and optimization

prompt_cache_key
string
Used to cache responses for similar requests to optimize your cache hit rates. Replaces the user field.
prompt_cache_retention
string
The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps cached prefixes active for longer.
cached_content
string
Optional. The name of the cached content to use as context to serve the prediction. Format: cachedContents/{cachedContent} (Google-specific).

Reasoning and thinking

reasoning_effort
string
Constrains effort on reasoning for reasoning models. Currently supported values are none, minimal, low, medium, high, and xhigh.
  • gpt-5.1 defaults to none
  • Models before gpt-5.1 default to medium
  • gpt-5-pro defaults to high
thinking
object
Extended thinking configuration (Anthropic-specific).
prediction
object
Static predicted output content, such as the content of a text file that is being regenerated.Fields:
  • type (required): “content”
  • content (required): string or array of text content parts

Safety and content filtering

safe_prompt
boolean
Whether to inject a safety prompt before all conversations.
safety_settings
array
Safety/content filtering settings (Google-specific).
guardrails
array
Content filtering and safety policy configuration.

Model routing and handoffs

agent_attributes
object
Agent attributes. Values in [0.0, 1.0]. Used for model routing decisions.
model_attributes
object
Model attributes for routing. Maps model IDs to attribute dictionaries with values in [0.0, 1.0].
handoff_config
object
Configuration for multi-model handoffs.
handoff_mode
boolean
Handoff control. None or omitted: auto-detect. true: structured handoff (SDK). false: drop-in (LLM re-run for mixed turns).
correlation_id
string
Stable session ID for resuming a previous handoff. Returned by the server on handoff; echo it on the next request to resume.
deferred_calls
array
Tier 2 stateless resumption. Deferred tool specs from a previous handoff response, sent back verbatim so the server can resume without Redis.
max_turns
int
Maximum conversation turns.

Service configuration

service_tier
string
Service tier for request processing. Options: “auto” | “default” | “flex” | “scale” | “priority”.
speed
string
The inference speed mode for this request. "fast" enables high output-tokens-per-second inference.
  • "standard": Default speed
  • "fast": High-speed inference
inference_geo
string
Specifies the geographic region for inference processing. If not specified, the workspace’s default_inference_geo is used.
deferred
boolean
If set to true, the request returns a request_id. You can then get the deferred response by GET /v1/chat/deferred-completion/{request_id}.

Metadata and tracking

metadata
object
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
store
boolean
Whether or not to store the output of this chat completion request for use in model distillation or evals products.

Provider-specific parameters

generation_config
object
Generation parameters wrapper (Google-specific).
output_config
object
Output configuration (provider-specific).
search_parameters
object
Set the parameters to be used for searched data. If not set, no data will be acquired by the model.
web_search_options
object
This tool searches the web for relevant results to use in a response.
verbosity
string
Constrains the verbosity of the model’s response. Currently supported values are low, medium, and high.

Request options

extra_headers
object
Send extra headers with the request.
extra_query
object
Add additional query parameters to the request.
extra_body
object
Add additional JSON properties to the request body.
timeout
float
Override the client-level default timeout for this request, in seconds.
idempotency_key
string
Specify a custom idempotency key for this request.

Response

Returns a ChatCompletion object when stream=False (default), or a Stream[ChatCompletionChunk] when stream=True.

ChatCompletion fields

id
string
required
A unique identifier for the chat completion.
object
string
required
The object type, which is always chat.completion.
created
int
required
The Unix timestamp (in seconds) of when the chat completion was created.
model
string
required
The model used for the chat completion.
choices
array
required
A list of chat completion choices. Can be more than one if n is greater than 1.Each choice contains:
  • index: The index of this choice
  • message: The chat completion message with role and content
  • finish_reason: “stop” | “length” | “tool_calls” | “content_filter” | null
  • logprobs: Log probability information (if requested)
usage
object
Usage statistics for the completion request.Fields:
  • prompt_tokens: Number of tokens in the prompt
  • completion_tokens: Number of tokens in the completion
  • total_tokens: Total tokens used
  • completion_tokens_details: Breakdown of completion tokens
  • prompt_tokens_details: Breakdown of prompt tokens
system_fingerprint
string
This fingerprint represents the backend configuration that the model runs with. Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
service_tier
string
The processing type used for serving the request.

Dedalus-specific response fields

tools_executed
array
List of tool names that were executed server-side (e.g., MCP tools). Only present when tools were executed on the server.
mcp_tool_results
array
Detailed results of MCP tool executions including inputs, outputs, and timing. Provides full visibility into server-side tool execution for debugging and audit purposes.
mcp_server_errors
object
MCP server failures keyed by server name. Each error contains:
  • message: Human-readable error message
  • code: Machine-readable error code
  • recommendation: Suggested action for the user
pending_tools
array
Client tools to execute, with dependency ordering. Each pending tool contains:
  • id: Unique identifier for this tool call
  • name: Name of the tool to execute
  • arguments: Input arguments for the tool call
  • dependencies: IDs of other pending calls that must complete first
server_results
object
Completed server tool outputs keyed by call ID.
correlation_id
string
Stable session ID for cross-turn handoff state. Echo this on the next request to resume server-side execution.
deferred
array
Server tools blocked on client results.
turns_consumed
int
Number of internal LLM calls made during this request. SDKs can sum this across their outer loop to track total LLM calls.

Examples

Basic completion

from dedalus_labs import Dedalus

client = Dedalus()

completion = client.chat.completions.create(
    model="openai/gpt-5-nano",
    messages=[
        {
            "role": "system",
            "content": "You are Stephen Dedalus. Respond in morose Joycean malaise."
        },
        {
            "role": "user",
            "content": "Hello, how are you today?"
        }
    ]
)

print(completion.choices[0].message.content)

With temperature and max tokens

completion = client.chat.completions.create(
    model="openai/gpt-5-nano",
    messages=[
        {"role": "user", "content": "Write a haiku about AI"}
    ],
    temperature=0.7,
    max_tokens=100
)

With function calling

completion = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[
        {"role": "user", "content": "What's the weather in Boston?"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather in a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }
    ],
    tool_choice="auto"
)

Async usage

from dedalus_labs import AsyncDedalus

client = AsyncDedalus()

completion = await client.chat.completions.create(
    model="openai/gpt-5-nano",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

Build docs developers (and LLMs) love