Agentic Engineering: The Harness Principle Explained

A state-of-the-art language model is not, by itself, a software engineering agent. Left without structure, even the most capable model will hallucinate dependencies, leak credentials, ignore edge cases, and forget everything the moment the session ends. Agentic Engineering is the discipline of wrapping a model in a Harness — a carefully designed infrastructure of rules, context, tools, sandboxes, and deterministic guardrails that shapes raw model capability into predictable, auditable, production-grade behavior. In dbv-specs-ops, the Harness is not an accessory to the workflow; it is the workflow.

The Harness Equation

The fundamental insight of Agentic Engineering can be expressed as a single formula:

Agent = Model (10%) + Harness (90%)

The model contributes raw language understanding and code generation capacity. The Harness contributes everything that makes that capacity reliable enough to trust in production: the instructions that set its role, the context that eliminates amnesia, the tools that let it act on real systems, the sandbox that contains its side effects, and the guardrails that catch what slips through. Choosing a better model without improving the Harness yields marginal gains. Improving the Harness with any capable model yields structural gains.

The 5 Harness Components

1. Instructions & Rules

The role, directives, and hard limits the AI must follow. In dbv-specs-ops, this is docs/MASTER_PROMPT.md — a structured XML document with sections for workflow, boundaries, coding standards, and trust rules. Platform-specific activation files (.windsurfrules, CLAUDE.md, GEMINI.md) inject this context automatically at session start.

2. Static & Dynamic Context

Static context (e.g., memory.md) is loaded at every session start — high fidelity but costly in tokens. Dynamic context / Skills (skills/) are procedure packages injected into the prompt on demand, only when the task matches. This split optimizes token consumption and keeps the AI’s attention focused on what is relevant right now.

3. Tools & Integrations

MCP (Model Context Protocol) connectors that allow the agent to interact with real systems: databases, Git APIs, file systems, and external services. The /spec phase explicitly evaluates whether a task warrants creating a local MCP server or new skill modules, and proposes it to the developer before proceeding.

4. Sandbox

Isolated execution environments — virtual environments (venv) for Python projects or Docker containers — where the agent runs code and tests safely. The /build phase mandates venv creation before any dependency installation, and the venv/ directory is always added to .gitignore automatically.

5. Deterministic Guardrails

Scripts and security tools that verify the agent’s output with certainty, independent of the model’s probabilistic judgment. In dbv-specs-ops, these run as the mandatory security audit inside the /code-simplify phase: linters, secret scanners, dependency validators, and input sanitization checks.

Conductor Mode vs. Orchestrator Mode

At the /plan phase, the AI implicitly classifies every incoming task into one of two execution modes based on its scope and impact. This classification happens automatically — the developer does not need to configure it. Conductor Mode applies to small, contained changes:

Corrections, minor refactors, or isolated unit tests
Touches ≤ 2 files and < 50 lines
Proceeds with rapid, interactive iterations directly in the IDE

Orchestrator Mode applies to larger, cross-cutting work:

New features, migrations, or changes spanning > 2 files
Requires a detailed step-by-step plan in task.md
For complex tasks (> 3 files, authentication/payments/sensitive data, or > 150 estimated new lines), produces implementation_plan.md with a required YAML frontmatter containing dependencies, risks, and rollback_strategy
Waits for explicit approval before executing — the developer reviews and confirms the plan before a single line of production code is written

A plan is considered complex — and therefore requires implementation_plan.md — if it meets any one of these criteria: it touches more than 3 files, it involves authentication, sensitive data, or payments, or it is estimated to produce more than 150 new lines of code.

The Adversarial Architect Review

Before the AI breaks any plan into executable tasks, it must run an Adversarial Architect Review. This is a mandatory internal debate printed in the chat as a structured XML block. Its purpose is to force explicit analysis of edge cases and security failures before any code is committed. The format requires a <builder> voice proposing the plan and an <adversary> voice challenging it with domain-specific risks. Crucially, the <adversary> block must cite at least one concrete noun from docs/SPECIFICATIONS.md — generic objections about “users” or “input” without project-specific context are not sufficient.

<architect_review>
  <builder>
    Proposed plan: implement the payment webhook handler using the
    Stripe event object defined in SPECIFICATIONS.md §4.2.
  </builder>
  <adversary>
    Domain-specific risk: what happens if the StripeWebhookEvent arrives
    with a duplicate event_id that was already processed? Is there
    an idempotency check on the order_fulfillment_service, or will
    the customer be charged twice?
  </adversary>
  <builder>
    Resolution: add an idempotency key check against the processed_events
    table before executing fulfillment. Update SPECIFICATIONS.md §4.2
    to document this constraint explicitly.
  </builder>
</architect_review>

If the Adversarial Review surfaces a risk that is consciously accepted rather than resolved, the AI records it immediately in memory.md under ## 🏗️ Log de Decisiones Técnicas before continuing. Accepted risks are never silently dropped.

Security Risks Unique to AI-Generated Code

The /code-simplify phase enforces a mandatory security audit that targets three categories of vulnerability that are disproportionately common in AI-generated code:

Credential Leakage

AI agents can inadvertently hardcode API keys, passwords, staging tokens, or other secrets directly in source files. The audit scans every file touched in the current cycle for patterns that indicate hardcoded credentials before the code reaches version control.

Dependency Hallucination / Slopsquatting

Language models can suggest external libraries or packages that do not exist in any official registry. This is not merely a runtime error — attackers actively monitor AI output and register hallucinated package names (a technique called slopsquatting) to inject malware into build pipelines. Every import added during a build cycle is validated against real package registries during the security audit.

Logic Vulnerabilities

Code that compiles and passes basic tests can still introduce SQL injection paths, authorization bypasses on public endpoints, or buffer overflows. The audit reviews critical input paths and public-facing interfaces for these structural weaknesses before shipment.

Slopsquatting is an active, real-world attack vector. A hallucinated package name that appears in generated code can be registered by a malicious actor within hours. Never skip the dependency validation step in /code-simplify.

Get Started

Core Concepts

The SDD Workflow

Platform Setup

Advanced

Agentic Engineering: The Harness Principle Explained

The Harness Equation

The 5 Harness Components

1. Instructions & Rules

2. Static & Dynamic Context

3. Tools & Integrations

4. Sandbox

5. Deterministic Guardrails

Conductor Mode vs. Orchestrator Mode

The Adversarial Architect Review

Security Risks Unique to AI-Generated Code

Credential Leakage

Dependency Hallucination / Slopsquatting

Logic Vulnerabilities

Build docs developers (and LLMs) love

Get Started

Core Concepts

The SDD Workflow

Platform Setup

Advanced

Documentation Index

​The Harness Equation

​The 5 Harness Components

1. Instructions & Rules

2. Static & Dynamic Context

3. Tools & Integrations

4. Sandbox

5. Deterministic Guardrails

​Conductor Mode vs. Orchestrator Mode

​The Adversarial Architect Review

​Security Risks Unique to AI-Generated Code

​Credential Leakage

​Dependency Hallucination / Slopsquatting

​Logic Vulnerabilities

Build docs developers (and LLMs) love

The Harness Equation

The 5 Harness Components

Conductor Mode vs. Orchestrator Mode

The Adversarial Architect Review

Security Risks Unique to AI-Generated Code

Credential Leakage

Dependency Hallucination / Slopsquatting

Logic Vulnerabilities