When this skill fires
The skill description reads: “Use when code is slow, resource-heavy, or needs optimization — before making any changes, after profiling reveals bottlenecks, or when designing performance-sensitive systems.” Specific triggers:- Code is measurably slow or resource-heavy
- Profiling has revealed specific bottlenecks
- Designing a system with known performance requirements
- Reviewing code for performance before production
When NOT to use it
- You “feel like” code might be slow but haven’t measured
- Premature optimization during initial implementation
- Micro-optimizations that won’t move the needle
Don’t optimize what you haven’t profiled. The bottleneck is almost never where you think it is.
What it does
The skill follows a six-step cycle: establish a baseline, profile to find the bottleneck, form a hypothesis, apply one targeted change, measure again, and document the result. It never skips the measurement steps.How it works
Measure baseline
Before touching anything, establish a benchmark. Record:
- Current performance numbers (time, memory, CPU)
- Test conditions (data size, concurrency, hardware)
Profile to find the bottleneck
Use profiling tools appropriate to your stack:
The bottleneck is the one place where optimization actually matters. Don’t optimize anything else.
| Stack | Tools |
|---|---|
| Python | cProfile, py-spy, memory_profiler |
| Node.js | --prof, clinic.js, Chrome DevTools |
| Go | pprof |
| Generic | timing instrumentation, APM tools |
Form a hypothesis
State explicitly: “I believe X is slow because Y.” If you can’t explain why something is slow, you don’t understand the problem well enough to fix it.
Apply a targeted fix
Change ONE thing at a time. Common high-impact areas:
| Area | Look for |
|---|---|
| Database | N+1 queries, full table scans, missing indexes |
| Caching | Repeated expensive computations with same inputs |
| Network | Chatty APIs, large payloads, synchronous chains |
| Algorithms | Nested loops over large collections (O(n²) → O(n log n)) |
| Memory | Objects created in tight loops, large in-memory datasets |
| I/O | Synchronous blocking, missing batching |
Measure again
Compare to baseline. Did it improve? If not: revert and try something else. A fix without measurement is not a fix.
Red flags
| Thought | Reality |
|---|---|
| ”This looks slow” | Measure it. Looks are deceiving. |
| ”I’ll optimize as I go” | Premature optimization obscures intent. Measure first. |
| ”I fixed the bottleneck” | Did you measure? A fix without measurement is not confirmed. |
| ”This is the obvious bottleneck” | Profile anyway. You’re probably wrong. |
Example scenario
Your API endpoint that lists orders is taking 3 seconds per request. The performance-optimization skill fires. The agent:- Baseline: Records 3,100ms average response time with 1,000 orders in the database, measured with
wrk -t2 -c10 -d10s http://localhost:3000/api/orders - Profile: Adds timing instrumentation to the route handler. Finds: DB query takes 50ms, serialization takes 40ms, but a loop calling
getUser(orderId)for each order takes 2,900ms - Hypothesis: “I believe the loop is slow because it makes one database query per order (N+1 problem)”
- Fix: Replace the per-order user lookup loop with a single JOIN query — one change only
- Measure: New average: 95ms. 97% improvement.
- Document: “Eliminated N+1 query in
/api/orders. Before: 3,100ms (1,000 sequential user lookups). After: 95ms (single JOIN). Baseline: 2026-03-17,wrkwith 10 concurrent connections.”
Related skills
Systematic debugging
Performance problems are bugs. Root cause investigation applies before any optimization.
Verification before completion
Measure after the fix before claiming the optimization is complete.