Use this file to discover all available pages before exploring further.
The core agent loop — send instructions and context, receive a response or tool call, execute the tool, return the result, repeat — is the same regardless of which provider you use. Provider differences are mostly in message shape, state handling, hosted tools, streaming events, and reasoning item formats. Designing the loop against a provider-neutral abstraction means you can switch or mix providers without rewriting your harness.
instructions + context + tool schemas -> model output -> final response or tool call -> application executes tool -> application returns tool result -> repeat
The harness should implement this loop once against an internal adapter interface, and each provider gets a thin adapter that translates to and from its wire format. Internal event types — ToolCall, ToolResult, FinalAnswer — remain stable even when provider APIs change.Adapter responsibilities:
normalize input messages/itemsnormalize tool schemasnormalize model output into ToolCall or FinalAnswer eventsnormalize tool results back to provider formathandle streaming event conversionhandle provider-specific state chainingcapture token/cost/latency metadata
Use the Responses-style API for new OpenAI-native agent work when available. It provides typed output items, hosted tools, remote MCP support, stateful chaining options, and richer agent-like primitives.
Use previous_response_id for stateful chaining. The provider stores prior output items, so you do not need to resend the full message history on each turn. Use the harness for private and business tools, permission checks, durable state, and audit logs even when hosted tools are available.
Use Chat Completions when you need compatibility with OpenAI-compatible providers or when your harness already owns message history manually.
messages = [ {"role": "system", "content": instructions}, {"role": "user", "content": task},]while True: response = client.chat.completions.create( model=model, messages=messages, tools=visible_tools, ) msg = response.choices[0].message messages.append(msg) if not msg.tool_calls: return msg.content for call in msg.tool_calls: result = execute_tool(call.function.name, call.function.arguments) messages.append({ "role": "tool", "tool_call_id": call.id, "content": result, })
In this pattern, the harness owns conversation state, message trimming, compaction, prior tool results, tool-call ID matching, approval pauses, retries, and finalization.OpenAI prompt caching is automatic on supported requests. Cache hit data is reported in usage.prompt_tokens_details.cached_tokens.
Anthropic uses structured tool_use content blocks in assistant messages and tool_result blocks in subsequent user messages.Provider-neutral shape:
request: messages + toolsresponse: assistant content with tool_use blocksapplication: validate and execute tool_use blocksnext request: user message containing tool_result content blocksrepeat until final answer
Prompt caching uses explicit cache_control markers on content blocks or automatic caching depending on the API path. Place cache markers after stable blocks, not before volatile blocks:
# Example: mark stable system content for caching{ "role": "user", "content": [ { "type": "text", "text": stable_instructions, "cache_control": {"type": "ephemeral"} }, { "type": "text", "text": current_task # volatile — no cache_control } ]}
Monitor cache_creation_input_tokens and cache_read_input_tokens in the response usage fields. Keep the same harness rules regardless of provider: validate arguments locally, check permissions, return structured results, preserve budgets, and trace every step.
OpenAI-compatible APIs accept requests in the Chat Completions message format. Target them by setting base_url on the OpenAI client:
Write a thin adapter for each provider so the harness loop does not change when switching providers. The adapter translates between the provider’s wire format and your internal event types:
class ProviderAdapter: def build_request(self, context: AgentContext) -> dict: """Translate AgentContext to provider request shape.""" ... def parse_response(self, raw: dict) -> list[AgentEvent]: """Normalize provider response into ToolCall or FinalAnswer events.""" ... def build_tool_result(self, call_id: str, result: str) -> dict: """Translate a tool result into the provider's expected format.""" ... def extract_usage(self, raw: dict) -> UsageMetrics: """Capture token counts, cache fields, and latency from response metadata.""" ...
Internal event types should be stable:
@dataclassclass ToolCall: call_id: str name: str arguments: dict@dataclassclass FinalAnswer: content: str@dataclassclass UsageMetrics: input_tokens: int cached_tokens: int output_tokens: int estimated_cost: float
The main harness loop operates entirely on these internal types and never inspects provider-specific fields directly.
Useful for web search, file search, code execution, image generation, computer/browser use, and remote connector calls supported by the provider. Run in provider infrastructure.
Client tools
Preferred for private business APIs, tenant-specific permissions, regulated data, financial actions, communication sends, state-changing operations, and custom audit requirements. Run in your application or sandbox.
Do not outsource business authorization to a hosted tool unless the product explicitly supports and logs the required approval policy.
required fields explicitunknown fields rejectedenums for actionsminimum/maximum constraintsvalidated IDsstructured outputs
Then validate again in the harness before execution. Provider-side schema enforcement and harness-side validation are not redundant — they protect against different failure modes.
Every request sends the full selected context. Simplest to reason about. The harness owns all state. Works with any provider.
Previous-response chaining
The provider stores prior state and you reference it with previous_response_id. Reduces payload size. Available with the OpenAI Responses API.
Conversation object
The provider stores a conversation and you append to it by ID. Available with some provider APIs.
Application event store
The harness stores the full operational history in a durable store. Required for audit, replay, approval workflows, and evals. Use this regardless of which provider state strategy you also employ.
Even when provider state is used, maintain an application event store for audit, replay, approvals, and evaluations.
Provider-specific message formats are the most common portability break. Tool result messages differ between Chat Completions (role: "tool" with tool_call_id) and Anthropic (tool_result content blocks). An adapter that handles both prevents the harness loop from caring about the difference.
Additional pitfalls:
assuming OpenAI-compatible means full OpenAI feature parityplacing timestamps or request IDs in the stable prefix (breaks caching)passing raw provider tokens to the model contextrelying on hosted-tool authorization for regulated or financial actionsstreaming partial tool call arguments to execution logicfailing to log cache usage fields from response metadatachanging tool schema order between requests without versioning