Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/DenisSergeevitch/agents-best-practices/llms.txt

Use this file to discover all available pages before exploring further.

The core agent loop — send instructions and context, receive a response or tool call, execute the tool, return the result, repeat — is the same regardless of which provider you use. Provider differences are mostly in message shape, state handling, hosted tools, streaming events, and reasoning item formats. Designing the loop against a provider-neutral abstraction means you can switch or mix providers without rewriting your harness.

Provider-Neutral Abstraction

The provider-neutral view of the loop is:
instructions + context + tool schemas
  -> model output
  -> final response or tool call
  -> application executes tool
  -> application returns tool result
  -> repeat
The harness should implement this loop once against an internal adapter interface, and each provider gets a thin adapter that translates to and from its wire format. Internal event types — ToolCall, ToolResult, FinalAnswer — remain stable even when provider APIs change. Adapter responsibilities:
normalize input messages/items
normalize tool schemas
normalize model output into ToolCall or FinalAnswer events
normalize tool results back to provider format
handle streaming event conversion
handle provider-specific state chaining
capture token/cost/latency metadata

API Patterns by Provider

Use the Responses-style API for new OpenAI-native agent work when available. It provides typed output items, hosted tools, remote MCP support, stateful chaining options, and richer agent-like primitives.
response = client.responses.create(
    model=model,
    instructions=instructions,
    input=input_items,
    tools=visible_tools,
    store=True,
)

for item in response.output:
    if item.type == "function_call":
        result = execute_tool(item.name, item.arguments)
        next_response = client.responses.create(
            model=model,
            previous_response_id=response.id,
            input=[{
                "type": "function_call_output",
                "call_id": item.call_id,
                "output": result,
            }],
        )
Use previous_response_id for stateful chaining. The provider stores prior output items, so you do not need to resend the full message history on each turn. Use the harness for private and business tools, permission checks, durable state, and audit logs even when hosted tools are available.

Model Adapter Pattern

Write a thin adapter for each provider so the harness loop does not change when switching providers. The adapter translates between the provider’s wire format and your internal event types:
class ProviderAdapter:
    def build_request(self, context: AgentContext) -> dict:
        """Translate AgentContext to provider request shape."""
        ...

    def parse_response(self, raw: dict) -> list[AgentEvent]:
        """Normalize provider response into ToolCall or FinalAnswer events."""
        ...

    def build_tool_result(self, call_id: str, result: str) -> dict:
        """Translate a tool result into the provider's expected format."""
        ...

    def extract_usage(self, raw: dict) -> UsageMetrics:
        """Capture token counts, cache fields, and latency from response metadata."""
        ...
Internal event types should be stable:
@dataclass
class ToolCall:
    call_id: str
    name: str
    arguments: dict

@dataclass
class FinalAnswer:
    content: str

@dataclass
class UsageMetrics:
    input_tokens: int
    cached_tokens: int
    output_tokens: int
    estimated_cost: float
The main harness loop operates entirely on these internal types and never inspects provider-specific fields directly.

Hosted Tools vs. Client Tools

Hosted tools

Useful for web search, file search, code execution, image generation, computer/browser use, and remote connector calls supported by the provider. Run in provider infrastructure.

Client tools

Preferred for private business APIs, tenant-specific permissions, regulated data, financial actions, communication sends, state-changing operations, and custom audit requirements. Run in your application or sandbox.
Do not outsource business authorization to a hosted tool unless the product explicitly supports and logs the required approval policy.

Strict Schemas and Validation

Use strict function schemas where available:
required fields explicit
unknown fields rejected
enums for actions
minimum/maximum constraints
validated IDs
structured outputs
Then validate again in the harness before execution. Provider-side schema enforcement and harness-side validation are not redundant — they protect against different failure modes.

Streaming

Streaming reduces latency but adds complexity:
  • Buffer enough data to validate complete tool calls before acting on them
  • Execute only when a tool call is complete, never on partial output
  • Keep result ordering deterministic across parallel tool calls
  • Handle aborts by sending synthetic tool results if the provider requires it
  • Do not stream partial sensitive data to users before output guardrails run

State Strategies

Every request sends the full selected context. Simplest to reason about. The harness owns all state. Works with any provider.
The provider stores prior state and you reference it with previous_response_id. Reduces payload size. Available with the OpenAI Responses API.
The provider stores a conversation and you append to it by ID. Available with some provider APIs.
The harness stores the full operational history in a durable store. Required for audit, replay, approval workflows, and evals. Use this regardless of which provider state strategy you also employ.
Even when provider state is used, maintain an application event store for audit, replay, approvals, and evaluations.

Common Pitfalls

Provider-specific message formats are the most common portability break. Tool result messages differ between Chat Completions (role: "tool" with tool_call_id) and Anthropic (tool_result content blocks). An adapter that handles both prevents the harness loop from caring about the difference.
Additional pitfalls:
assuming OpenAI-compatible means full OpenAI feature parity
placing timestamps or request IDs in the stable prefix (breaks caching)
passing raw provider tokens to the model context
relying on hosted-tool authorization for regulated or financial actions
streaming partial tool call arguments to execution logic
failing to log cache usage fields from response metadata
changing tool schema order between requests without versioning

Build docs developers (and LLMs) love