Use this file to discover all available pages before exploring further.
The architecture-decision-record skill teaches your AI coding agent to produce structured Architecture Decision Records that are grounded in your actual codebase. Instead of generic pros-and-cons lists, every ADR includes code evidence for affected files, migration steps, reversibility analysis, and a Well-Architected pillar impact table that only covers pillars where the decision genuinely matters.
The agent reads your IaC, application code, and configuration files before writing a single line of the ADR. Implementation effort is expressed as specific files affected and migration steps — not T-shirt sizes.
WA Pillar Impact Table
Each option is scored against the six WA pillars. Pillars with no real impact are omitted to keep the table honest. Each non-neutral score requires a reason tied to actual code or architecture patterns.
Trade-off Transparency
The ADR explicitly documents what you gain, what you accept, and what could go wrong — including reversibility. Irreversible decisions get deeper options analysis.
Review Triggers
Each ADR ends with specific, measurable conditions — not vague “revisit when things change” notes — so the decision gets re-evaluated when the actual thresholds are hit.
Every ADR produced by the skill follows this structure:
1
Context
Problem statement, current state with file paths and code references, constraints derived from codebase analysis (not assumptions), and decision drivers ordered by priority.
2
Options evaluated
For each option: how it works, pros, cons, files affected (listed by path), migration steps from current state, and effort estimate with basis. One option is marked Chosen; others are marked Rejected with a clear primary reason and the future condition under which they would become the better choice.
3
Well-Architected impact
A pillar impact table using ✅ Positive, ➖ Neutral, ⚠️ Trade-off, ❌ Negative — with only non-neutral pillars shown. Each entry explains why, grounded in the codebase’s reality and citing the specific AWS service or code path that creates the benefit or risk.
4
Trade-offs
Explicit statements of what you gain (with evidence), what you accept (with justification), and a risk table covering likelihood, impact, and concrete mitigations.
5
Implementation
Step-by-step migration path with affected files named, a specific rollback plan (not “revert the change”), and verification criteria — metrics or tests that confirm the decision is working.
6
Review triggers
Specific, measurable thresholds: “p99 latency exceeds 500ms”, “team grows beyond 8 engineers”, “re-evaluate after 6 months of production data”.
The skill’s output for a real architectural decision looks like this. Notice how the pillar table omits pillars that are genuinely neutral:
# ADR-012: Event pipeline for order processing — SQS vs Kinesis Data Streams## StatusAccepted## Date2025-06-01## Context### Problem StatementOrder events are currently published directly to the processor Lambda, creating tight couplingand no replay capability. We need a durable, ordered event pipeline between order-service andfulfillment-service.### Current State- `src/order-service/handlers/create-order.ts:87` — direct Lambda.invoke() to fulfillment-service- `infrastructure/order-stack.ts:44` — no queue or stream configured- `src/fulfillment-service/` — expects synchronous invocation, no consumer group logic### Constraints- Fulfillment service requires strict per-customer ordering — Evidence: `ARCHITECTURE.md:23`- Team has no Kinesis operational experience — Evidence: `team-skills.md`- Current throughput: ~50 orders/min peak — Evidence: `monitoring/dashboards.json:metrics`### Decision Drivers1. Per-customer ordering guarantee — required by fulfillment service contract2. Operational simplicity — team onboarding to AWS queuing for the first time3. Replay capability — at least 7-day retention for incident recovery## DecisionUse Amazon SQS FIFO queues with message group IDs mapped to customer IDs.## Options Evaluated### Option 1: SQS FIFO ← Chosen- **How it works**: FIFO queue with `MessageGroupId = customerId` gives per-customer ordering; standard SQS retry and DLQ handling for failures- **Pros**: team-familiar pattern, no shard management, built-in DLQ, scales automatically- **Cons**: max 300 TPS per API action (30,000 with batching); no sub-second reprocessing- **Files affected**: `infrastructure/order-stack.ts`, `src/order-service/handlers/create-order.ts`, `src/fulfillment-service/handlers/process-order.ts` (3 files)- **Migration**: add SQS FIFO construct → update publisher to sendMessage → convert consumer to SQS event source mapping → remove direct Lambda.invoke- **Effort**: ~3 days### Option 2: Kinesis Data Streams — Rejected- **Primary rejection reason**: shard management complexity for a team with no Kinesis experience, and current 50 orders/min throughput doesn't justify the operational overhead- **Would choose this if**: throughput exceeds 5,000 orders/min or replay latency under 100ms becomes a hard requirement## Well-Architected Impact| Pillar | Option A (SQS FIFO) | Option B (Kinesis) ||--------|---------------------|--------------------|| Reliability | ✅ Built-in DLQ, at-least-once delivery, automatic scaling | ⚠️ Shard iterator management; data loss on unhandled consumer failures || Operational Excellence | ✅ Team-familiar; CloudWatch metrics out of the box | ❌ Shard monitoring, resharding operations, enhanced fan-out complexity || Cost Optimization | ✅ Pay-per-request, no idle capacity | ⚠️ Hourly per-shard charge even at low throughput (~$10.95/shard/month) |## Trade-offs### What We Gain- Per-customer ordering without application-level sequencing logic — matters because fulfillment service relies on order state being applied in sequence (ARCHITECTURE.md:23)- DLQ + CloudWatch alarm = automatic failure surfacing with zero additional instrumentation### What We Accept- 300 TPS ceiling on the FIFO queue API — acceptable at current 50 orders/min peak; review trigger set at 3,000 orders/min to allow headroom### Risks| Risk | Likelihood | Impact | Mitigation ||------|-----------|--------|------------|| FIFO TPS limit hit unexpectedly | Low | High — orders queued | Alarm at 200 TPS; pre-tested Kinesis migration path || MessageGroupId cardinality too high | Low | Medium — queue lag | Cap at customerId; monitor ApproximateAgeOfOldestMessage |## Implementation### Migration Path1. Add `SqsFifoQueue` construct in `infrastructure/order-stack.ts:44`2. Update `create-order.ts:87`: replace `Lambda.invoke()` with `sqs.sendMessage({MessageGroupId: customerId})`3. Update `fulfillment-service/handler`: add SQS event source mapping, remove synchronous handler4. Deploy with feature flag; validate via `integration-tests/order-pipeline.test.ts`5. Remove old Lambda.invoke path after 2-week parallel run### Rollback PlanRe-enable `DIRECT_INVOKE` feature flag in SSM Parameter Store (`/payments/features/direct-invoke`);fulfillment handler retains synchronous path for 30 days post-migration.### Verification- `ApproximateNumberOfMessagesNotVisible` stays < 10 under load test- `NumberOfMessagesFailed` alarm threshold: 0 for 5 minutes → PagerDuty## Review Triggers- Orders/min sustained > 3,000 for 24 hours (CloudWatch metric)- Team grows beyond 10 engineers and a dedicated platform team forms- Re-evaluate 6 months after go-live with production latency data
Evaluated using an automated LLM-as-judge framework with paired comparison (same prompt, with and without skill context) using Claude Opus 4.8.
Baseline
With skill
Delta
Score
81%
100%
+19%
ADRs show the highest improvement among pillar-specific skills because a bare agent typically produces generic pros-and-cons lists. The skill adds code-evidenced implementation effort, concrete review triggers, and WA pillar scoring — all missing from baseline output.
If you invoke the skill without being in a codebase, the agent still produces a full ADR but marks implementation sections as “Verify against code.” The WA pillar analysis remains valid; the file-level impact analysis requires code access to be precise.