A production agent is not just a model with a system prompt. It is a model paired with a harness — the provider-neutral control plane that owns tool routing, permissions, context assembly, memory, compaction, approvals, tracing, and recovery. Getting this boundary right is the single most important architectural decision in any agentic system.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/DenisSergeevitch/agents-best-practices/llms.txt
Use this file to discover all available pages before exploring further.
The harness boundary
The model and the harness have distinct, non-overlapping responsibilities. Blurring this line is the root cause of most agent failures in production.The core principle is “model proposes, harness disposes.” The model emits a structured tool request; the harness decides whether to execute, deny, or pause for approval. The model never executes actions directly.
The 15-component model
A robust harness contains these components. Not every component needs to be built on day one — start with the minimal viable set and add components as measured gaps justify them.1. Instruction manager
Assembles and scopes system, developer, and user instructions. Controls which instruction blocks are loaded for a given task context.
2. Context builder
Constructs the full model input from instructions, memory, retrieved content, tool results, and prior observations. Enforces token budgets and cache-aware ordering.
3. Model adapter
Abstracts provider-specific API differences (OpenAI, Anthropic, OpenAI-compatible). Handles retries, rate limits, and response parsing.
4. Tool registry
Maintains the catalog of available tools, their schemas, risk classes, and visibility rules. Decides which tools are exposed to the model per turn.
5. Permission engine
Evaluates every tool call against policy before execution. Returns allow, deny, or approval-required decisions with reasons.
6. Execution engine
Runs permitted tool calls, manages parallelism for safe concurrent calls, and serializes writes and destructive operations.
7. State store
Persists session state, active plans, approval records, tool traces, and artifacts outside the prompt window.
8. Memory and retrieval layer
Retrieves relevant context from long-term memory, vector stores, or knowledge bases. Attaches content just-in-time rather than front-loading the prompt.
9. Compactor
Detects when context approaches token limits and produces structured summaries that preserve working state, plan progress, and approval history.
10. Planner and goal controller
Manages multi-step planning mode and long-running goal loops. Tracks objectives, done conditions, checkpoints, and budget consumption.
11. Skill registry
Discovers and loads reusable agent skill packages. Controls progressive disclosure so the model is not overwhelmed with capabilities it does not need.
12. MCP/external connector manager
Manages connections to Model Context Protocol servers and external APIs. Enforces scopes, credentials, and connector-level permissions.
13. Approval manager
Pauses the loop when a tool call requires human or policy approval. Records scoped approval decisions and resumes execution with the result.
14. Trace and evaluation system
Records typed operational events for debugging, auditing, cost analysis, and regression evals. Separate from model reasoning traces.
15. Sandbox or execution boundary
Isolates code execution, shell work, browser automation, and generated artifacts from the trusted control plane.
The boundary principle
The trusted control plane must live outside model-directed compute. Anything the model can modify should not be authoritative. The harness must own:- User identity and tenant boundaries
- Credential management
- Approval records
- Audit logs
- Billing and rate limits
- Tool authorization
- Final commit to external systems
- Temporary files
- Generated artifacts
- Script execution
- Isolated browser or shell work
- Connector-specific data processing
Authority hierarchy
The harness should label content by authority level so the model can reason about what to trust. Retrieved content may contain instructions, but those instructions are data, not policy.Minimal viable harness
Most teams should start here and add complexity only when evals show a gap:One context builder
Deterministic assembly of instructions, memory, and observations with explicit token budgets.
A narrow tool registry
Only the tools required for the primary job-to-be-done. No broad tools like
execute_anything.