These checklists are designed to be used at three points in the agent development lifecycle: during initial design to catch structural gaps early, during implementation to verify each component meets the required standard, and during pre-launch audit to confirm the system is safe for production traffic. Work through each checklist in order — earlier checklists surface design decisions that later checklists assume are already resolved. Check off items as you complete them; any unchecked item at audit time should be treated as a known gap with an explicit deferral decision, not a silent omission.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/DenisSergeevitch/agents-best-practices/llms.txt
Use this file to discover all available pages before exploring further.
MVP agent blueprint checklist
MVP agent blueprint checklist
- Domain, primary user, and job-to-be-done are stated.
- MVP scope, assumptions, non-goals, and deferred capabilities are explicit.
- Autonomy level is the lowest level that still creates value.
- Core model-tool-observation loop is specified.
- Step, tool-call, time, token, and cost budgets are specified.
- Minimal typed tool registry is defined.
- Permission matrix covers read, draft, write, external, financial, destructive, and privileged actions.
- Risky actions use draft/commit separation.
- Planning mode blocks mutation until approval.
- Goal-like loop has objective, checkpoints, budget, validation, and stop rules.
- Context builder separates stable/cacheable content from volatile state.
- Memory, plans, approvals, todos, and artifacts are stored outside the prompt.
- Auto-compaction summary format and rehydration rules are defined.
- Skills are progressively disclosed and permission-bounded.
- MCP/external connectors are namespaced, scoped, and logged.
- Prompt caching and cost telemetry are included.
- Traces and evals are defined before launch.
- First rollout is limited, monitored, or shadow-mode.
Design checklist
Design checklist
- Domain and user persona defined.
- Autonomy level selected.
- Risk classes identified.
- Success and done conditions defined.
- Source-of-truth systems identified.
- Instruction hierarchy defined.
- Tool registry scoped to minimum viable tools.
- Permission matrix written.
- Draft/commit split defined for risky actions.
- Context builder designed.
- Memory and durable state plan defined.
- Compaction trigger and summary format defined.
- Planning mode criteria defined.
- Goal loop criteria and budgets defined.
- Skills and connector strategy defined.
- Observability and eval plan defined.
Tool checklist (for each tool)
Tool checklist (for each tool)
- Name is specific and domain meaningful.
- Purpose says when to use and when not to use.
- Input schema is strict.
- Output schema is structured.
- Arguments are locally validated.
- Risk class assigned.
- Side effects declared.
- Permission policy assigned.
- Timeout set.
- Result size limit set.
- Retry policy set.
- Audit policy set.
- Errors return structured observations.
- Sensitive data is redacted.
Permission checklist
Permission checklist
- Read-only tools can run automatically only inside scope.
- Draft tools are separated from commit tools.
- External sends require approval.
- Financial actions require approval and strong auth.
- Destructive actions are denied or approval-gated with recovery plan.
- Identity/access changes require approval and strong auth.
- Shell/process execution is sandboxed.
- Connector tools are namespaced and scoped.
- Approval records are persisted.
- The model cannot approve its own actions.
Context checklist
Context checklist
- Trusted instructions separated from untrusted data.
- Scoped instructions loaded only when relevant.
- Retrieved content labeled by source and trust level.
- Exact facts preserved when needed.
- Large outputs summarized or stored externally.
- Active plan and goal reattached after compaction.
- Approval state reattached after compaction.
- Loaded skills and connector state tracked.
- Secrets are not placed in context.
Planning checklist
Planning checklist
- Planning mode exists for high-risk or ambiguous tasks.
- Mutation tools are blocked during planning.
- Plan artifact is stored outside prompt.
- Plan contains objective, scope, risks, steps, validation, rollback, and done condition.
- Approval tied to exact plan version.
- Execution uses todo/checkpoints after approval.
Goal checklist
Goal checklist
- Goal has one objective.
- Done condition is measurable.
- Budget is explicit.
- Validation method exists.
- Forbidden actions are listed.
- Approval-required actions are listed.
- Progress log is durable.
- Stop rules are explicit.
Skills checklist
Skills checklist
- Skill name matches directory name.
- Skill name is lowercase with hyphens only.
SKILL.mdhas required frontmatter.- Description says when to use the skill.
- Main instructions are concise.
- Detailed material is in focused Markdown references.
- References are loaded only when needed.
- Gotchas and validation steps are included.
- Skill activation eval exists.
- Output quality eval exists.
- Skill does not silently expand permissions.
MCP/external connector checklist
MCP/external connector checklist
- Servers/connectors inventoried.
- Tools namespaced by source.
- Credentials are per-user or scoped.
- Least privilege scopes used.
- Tool descriptions truncated or reviewed.
- External descriptions treated as untrusted.
- Risk classes mapped.
- Approval required for risky calls.
- Large results filtered before model context.
- Connector calls logged.
- Auth failure and revocation handled.
Evals checklist
Evals checklist
- Happy-path tasks.
- Near-miss tasks.
- Prompt injection tasks.
- Tool misuse tasks.
- Approval bypass attempts.
- Connector failure tasks.
- Context overflow and compaction tasks.
- Conflicting instruction tasks.
- High-risk action tasks.
- Cost and latency measured.
- Regression evals added for every production incident.
Agent legibility checklist
Agent legibility checklist
- Top-level instructions are a map, not a giant manual.
- Source-of-truth documents are indexed and retrievable.
- Active and completed plans are stored as durable artifacts.
- Domain schemas, policies, and runbooks are agent-readable.
- Validation signals are accessible through approved tools.
- Logs, metrics, traces, audit events, or workflow status are queryable where relevant.
- Human feedback is converted into docs, tools, validators, or evals.
- Stale docs and obsolete tools have a cleanup process.
- Quality scorecards or known-gap trackers exist for large systems.
Prompt caching checklist
Prompt caching checklist
- Stable instructions appear before volatile runtime state.
- Tool definitions and schemas are sorted deterministically.
- Dynamic values such as timestamps and request IDs are placed near the end or omitted.
- Prompt and tool bundle versions are tracked.
- Provider cached-token fields are logged.
- Cache hit rate is monitored by session and tenant or segment.
- System prompt and tool-list hashes are logged to detect fragmentation.
- Compaction boundaries are explicit.
- Summaries are not rewritten every turn.
- Long-retention cache settings are used only when reuse justifies them.
Mechanical invariant checklist
Mechanical invariant checklist
- Repeated prompt guidance has been converted into validators where possible.
- Validator errors include model-readable remediation instructions.
- Architecture or workflow boundaries are enforced mechanically.
- Secret/PII/source-citation checks exist where relevant.
- Cost, latency, and tool-result-size budgets are enforced.
- Regression evals are added after production incidents.
Minimal provider-neutral implementation path
Use this ordered path when building a new agent harness from scratch. Each step builds on the previous one — do not skip ahead to goal loops or subagents before the base harness passes evals.Build the core loop
Add strict schemas and validation
Add permission checks
Add budgets and stop conditions
Add tracing
Add prompt-cache-aware context ordering
Add planning mode
Add context compaction
Add skills
Add external connectors
Add goal loops and subagents