Before-and-after examples for all four Karpathy principles

This page walks through real code examples for each of the four Karpathy guidelines. Every example shows an actual user request, the problematic behavior LLMs typically produce, and what the correct response looks like — so you can calibrate expectations and spot the patterns in your own workflows.

The overcomplicated examples aren’t obviously wrong — they follow design patterns and best practices. The problem is timing: they add complexity before it is needed, which makes code harder to understand, introduces more bugs, takes longer to implement, and is harder to test.

Think Before Coding

Example 1: Hidden assumptions

User request: “Add a feature to export user data” What LLMs do wrong — silent assumptions

def export_users(format='json'):
    """Export all users to a file."""
    users = User.query.all()

    if format == 'json':
        with open('users.json', 'w') as f:
            json.dump([u.to_dict() for u in users], f)
    elif format == 'csv':
        with open('users.csv', 'w') as f:
            writer = csv.DictWriter(f, fieldnames=['id', 'email', 'name'])
            writer.writeheader()
            writer.writerows([u.to_dict() for u in users])

    return f"Exported {len(users)} users"

Problems with this approach:

Assumed the export should cover ALL users, ignoring pagination and privacy implications
Assumed a file output location without asking
Assumed which fields to include
Assumed CSV fieldnames without checking the actual data structure

What should happen — surface assumptions first

Before implementing, I need to clarify:

1. **Scope**: Export all users or filtered subset? (privacy implications)
2. **Format**: You mentioned "export" — did you mean:
   - Download file in browser?
   - Background job with email notification?
   - API endpoint returning data?
3. **Fields**: Which user fields? (some might be sensitive)
4. **Volume**: How many users typically? (affects approach)

Simplest approach: Add an API endpoint that returns paginated JSON.
Would need more info for file-based exports.

What's your preference?

Example 2: Multiple interpretations

User request: “Make the search faster” What LLMs do wrong — pick silently

# Adds caching, database indexes, and async processing
# without asking which "faster" matters
@lru_cache(maxsize=1000)
async def search(query: str) -> List[Result]:
    # ... 200 lines of optimization code

What should happen — present interpretations

"Make the search faster" could mean:

1. **Faster response time** (< 100ms instead of 500ms)
   - Add database indexes
   - Cache frequent queries
   - Estimated effort: 2 hours

2. **Handle more concurrent searches** (throughput)
   - Add async/await
   - Connection pooling
   - Estimated effort: 4 hours

3. **Faster perceived speed** (UX)
   - Show partial results immediately
   - Progressive loading
   - Estimated effort: 3 hours

Current search takes ~500ms for typical queries.
Which aspect matters most for your use case?

Simplicity First

Example 1: Over-abstraction

User request: “Add a function to calculate discount” What LLMs do wrong — overengineered

from abc import ABC, abstractmethod
from enum import Enum
from typing import Protocol, Union
from dataclasses import dataclass

class DiscountStrategy(ABC):
    @abstractmethod
    def calculate(self, amount: float) -> float:
        pass

class PercentageDiscount(DiscountStrategy):
    def __init__(self, percentage: float):
        self.percentage = percentage

    def calculate(self, amount: float) -> float:
        return amount * (self.percentage / 100)

class FixedDiscount(DiscountStrategy):
    def __init__(self, fixed_amount: float):
        self.fixed_amount = fixed_amount

    def calculate(self, amount: float) -> float:
        return min(self.fixed_amount, amount)

@dataclass
class DiscountConfig:
    strategy: DiscountStrategy
    min_purchase: float = 0.0
    max_discount: float = float('inf')

class DiscountCalculator:
    def __init__(self, config: DiscountConfig):
        self.config = config

    def apply_discount(self, amount: float) -> float:
        if amount < self.config.min_purchase:
            return 0
        discount = self.config.strategy.calculate(amount)
        return min(discount, self.config.max_discount)

# Usage requires 30+ lines of setup for a simple calculation

Problems with this approach:

Strategy pattern, abstract base classes, and dataclasses added for a single-use calculation
30+ lines of setup overhead before you can call anything
No multiple discount types exist in the codebase yet — this complexity has no payoff

What should happen — just a function

def calculate_discount(amount: float, percent: float) -> float:
    """Calculate discount amount. percent should be 0-100."""
    return amount * (percent / 100)

# Usage
discount = calculate_discount(100.0, 10.0)  # $10 off

When to add complexity: only when you actually need multiple discount types. If that requirement comes later, refactor then.

Example 2: Speculative features

User request: “Save user preferences to database” What LLMs do wrong — unnecessary features

class PreferenceManager:
    def __init__(self, db, cache=None, validator=None):
        self.db = db
        self.cache = cache or InMemoryCache()
        self.validator = validator or DefaultValidator()

    def save(self, user_id: int, prefs: dict,
             merge: bool = True,
             validate: bool = True,
             notify: bool = False) -> bool:
        """Save preferences with optional merging, validation, and notifications."""

        if validate:
            errors = self.validator.validate(prefs)
            if errors:
                raise ValidationError(errors)

        existing = self.get(user_id) if merge else {}

        if merge:
            existing.update(prefs)
            prefs = existing

        self.db.execute(
            "UPDATE users SET preferences = ? WHERE id = ?",
            (json.dumps(prefs), user_id)
        )

        if self.cache:
            self.cache.set(f"prefs:{user_id}", prefs)

        if notify:
            self.notify_preference_change(user_id, prefs)

        return True

    def notify_preference_change(self, user_id: int, prefs: dict):
        # Another 30 lines for a feature nobody asked for
        pass

Problems with this approach:

Caching, validation, merging, and notifications were never requested
A class with three optional constructor dependencies for a single DB write
notify_preference_change is a stub for a feature that does not exist

What should happen — exactly what was asked

def save_preferences(db, user_id: int, preferences: dict):
    """Save user preferences to database."""
    db.execute(
        "UPDATE users SET preferences = ? WHERE id = ?",
        (json.dumps(preferences), user_id)
    )

Add caching when performance matters, validation when bad data appears, merging when the requirement actually emerges.

Surgical Changes

Example 1: Drive-by refactoring

User request: “Fix the bug where empty emails crash the validator” What LLMs do wrong — too much

  def validate_user(user_data):
-     # Check email format
-     if not user_data.get('email'):
+     """Validate user data."""
+     email = user_data.get('email', '').strip()
+
+     # Validate email
+     if not email:
          raise ValueError("Email required")
-
-     # Basic email validation
-     if '@' not in user_data['email']:
+     if '@' not in email or '.' not in email.split('@')[1]:
          raise ValueError("Invalid email")

-     # Check username
-     if not user_data.get('username'):
+     # Validate username
+     username = user_data.get('username', '').strip()
+     if not username:
          raise ValueError("Username required")
+     if len(username) < 3:
+         raise ValueError("Username too short")
+     if not username.isalnum():
+         raise ValueError("Username must be alphanumeric")

      return True

Problems with this approach:

“Improved” email validation beyond the specific bug fix
Added username length and format validation that nobody requested
Changed existing comments
Added a docstring

What should happen — surgical

  def validate_user(user_data):
      # Check email format
-     if not user_data.get('email'):
+     email = user_data.get('email', '')
+     if not email or not email.strip():
          raise ValueError("Email required")

      # Basic email validation
-     if '@' not in user_data['email']:
+     if '@' not in email:
          raise ValueError("Invalid email")

      # Check username
      if not user_data.get('username'):
          raise ValueError("Username required")

      return True

Only the lines that fix empty email handling changed.

Example 2: Style drift

User request: “Add logging to the upload function” What LLMs do wrong — reformat everything

- def upload_file(file_path, destination):
+ def upload_file(file_path: str, destination: str) -> bool:
+     """Upload file to destination with logging."""
+     logger.info(f"Uploading {file_path} to {destination}")
+
      try:
-         with open(file_path, 'rb') as f:
-             data = f.read()
-
-         response = requests.post(destination, files={'file': data})
-
-         if response.status_code == 200:
-             return True
-         else:
-             return False
+         with open(file_path, "rb") as f:
+             data = f.read()
+         response = requests.post(destination, files={"file": data})
+         success = response.status_code == 200
+
+         if success:
+             logger.info(f"Upload successful: {file_path}")
+         else:
+             logger.error(f"Upload failed: {file_path}, status={response.status_code}")
+
+         return success
      except Exception as e:
-         print(f"Error: {e}")
+         logger.exception(f"Upload error for {file_path}: {e}")
          return False

Problems with this approach:

Changed quote style from '' to ""
Added type hints that were not requested
Added a docstring
Reformatted whitespace between statements
Restructured the boolean return logic

What should happen — match existing style

+ import logging
+
+ logger = logging.getLogger(__name__)
+
  def upload_file(file_path, destination):
+     logger.info(f'Starting upload: {file_path}')
      try:
          with open(file_path, 'rb') as f:
              data = f.read()

          response = requests.post(destination, files={'file': data})

          if response.status_code == 200:
+             logger.info(f'Upload successful: {file_path}')
              return True
          else:
+             logger.error(f'Upload failed: {file_path}, status={response.status_code}')
              return False
      except Exception as e:
-         print(f"Error: {e}")
+         logger.exception(f'Upload error: {file_path}')
          return False

Single quotes, no type hints, existing boolean pattern, and spacing style all preserved.

Goal-Driven Execution

Example 1: Vague vs. verifiable goals

User request: “Fix the authentication system” What LLMs do wrong — vague approach

I'll fix the authentication system by:
1. Reviewing the code
2. Identifying issues
3. Making improvements
4. Testing the changes

[Proceeds to make changes without clear success criteria]

What should happen — define verifiable goals

To "fix authentication," I need to define success criteria.
What specific issue are we solving?

If the issue is "users stay logged in after password change":

Plan:
1. Write test: Change password → verify old session invalidated
   Verify: Test fails (reproduces bug)

2. Implement: Invalidate sessions on password change
   Verify: Test passes

3. Check edge cases: Multiple active sessions, concurrent changes
   Verify: Additional tests pass

4. Verify no regression: Existing auth tests still pass
   Verify: Full test suite green

Current test coverage for auth: [shows coverage]
What's the specific auth issue you're seeing?

Example 2: Multi-step with verification

User request: “Add rate limiting to the API” What LLMs do wrong — all at once

# Implements full rate limiting with Redis, multiple strategies,
# configuration system, and monitoring in one 300-line commit
# No clear verification steps

What should happen — incremental with verification

Plan for rate limiting:

1. Add basic in-memory rate limiting (single endpoint)
   Verify:
   - Test: 100 requests → first 10 succeed, rest get 429
   - Manual: curl endpoint 11 times, see rate limit error

2. Extract to middleware (apply to all endpoints)
   Verify:
   - Test: Rate limits apply to /users and /posts
   - Existing endpoint tests still pass

3. Add Redis backend (for multi-server)
   Verify:
   - Test: Rate limit persists across app restarts
   - Test: Two app instances share rate limit counter

4. Add configuration (rates per endpoint)
   Verify:
   - Test: /search allows 10/min, /users allows 100/min
   - Config file parsed correctly

Each step is independently verifiable and deployable.
Start with step 1?

Example 3: Test-first verification

User request: “The sorting breaks when there are duplicate scores” What LLMs do wrong — fix without reproducing

# Immediately changes sort logic without confirming the bug
def sort_scores(scores):
    return sorted(scores, key=lambda x: (-x['score'], x['name']))

What should happen — reproduce first

# 1. First, write a test that reproduces the issue
def test_sort_with_duplicate_scores():
    """Test sorting when multiple items have same score."""
    scores = [
        {'name': 'Alice', 'score': 100},
        {'name': 'Bob', 'score': 100},
        {'name': 'Charlie', 'score': 90},
    ]

    result = sort_scores(scores)

    # The bug: order is non-deterministic for duplicates
    # Run this test multiple times, it should be consistent
    assert result[0]['score'] == 100
    assert result[1]['score'] == 100
    assert result[2]['score'] == 90

# Verify: Run test 10 times → fails with inconsistent ordering

# 2. Now fix with stable sort
def sort_scores(scores):
    """Sort by score descending, then name ascending for ties."""
    return sorted(scores, key=lambda x: (-x['score'], x['name']))

# Verify: Test passes consistently

Anti-patterns summary

Principle	Anti-pattern	Fix
Think Before Coding	Silently assumes file format, fields, scope	List assumptions explicitly, ask for clarification
Simplicity First	Strategy pattern for single discount calculation	One function until complexity is actually needed
Surgical Changes	Reformats quotes, adds type hints while fixing bug	Only change lines that fix the reported issue
Goal-Driven Execution	”I’ll review and improve the code"	"Write test for bug X → make it pass → verify no regressions”

Get Started

The Four Principles

Guides

Before-and-after examples for all four Karpathy principles

Think Before Coding

Example 1: Hidden assumptions

Example 2: Multiple interpretations

Simplicity First

Example 1: Over-abstraction

Example 2: Speculative features

Surgical Changes

Example 1: Drive-by refactoring

Example 2: Style drift

Goal-Driven Execution

Example 1: Vague vs. verifiable goals

Example 2: Multi-step with verification

Example 3: Test-first verification

Anti-patterns summary

Build docs developers (and LLMs) love

Get Started

The Four Principles

Guides

Documentation Index

​Think Before Coding

​Example 1: Hidden assumptions

​Example 2: Multiple interpretations

​Simplicity First

​Example 1: Over-abstraction

​Example 2: Speculative features

​Surgical Changes

​Example 1: Drive-by refactoring

​Example 2: Style drift

​Goal-Driven Execution

​Example 1: Vague vs. verifiable goals

​Example 2: Multi-step with verification

​Example 3: Test-first verification

​Anti-patterns summary

Build docs developers (and LLMs) love

Think Before Coding

Example 1: Hidden assumptions

Example 2: Multiple interpretations

Simplicity First

Example 1: Over-abstraction

Example 2: Speculative features

Surgical Changes

Example 1: Drive-by refactoring

Example 2: Style drift

Goal-Driven Execution

Example 1: Vague vs. verifiable goals

Example 2: Multi-step with verification

Example 3: Test-first verification

Anti-patterns summary