Presets

YC-Bench includes five carefully calibrated presets that test progressively harder scenarios. Each preset adjusts dozens of parameters to create specific challenges for LLM agents.

Available Presets

Tutorial

1-year horizon • Tests basic loop executionForgiving environment for testing basic CLI discovery and the accept → assign → dispatch → resume loop.

Easy

1-year horizon • Tests throughput awarenessSingle-domain tasks with moderate deadlines. Tests whether agents understand that parallelism dilutes throughput.

Medium

1-year horizon • Tests domain specializationPrestige ladder active, 2-domain tasks. Agents must specialize in a few domains to unlock higher-reward tiers.

Hard

1-year horizon • Tests capacity planningTight deadlines, heavy penalties, limited runway. Requires precise ETA calculation and conservative task acceptance.

Nightmare

1-year horizon • Tests sustained perfect playRazor-thin margins, aggressive compounding, steep prestige requirements. One mistake cascades into bankruptcy.

Preset Comparison

The table below highlights the key differences between presets:

Parameter	Tutorial	Easy	Medium	Hard	Nightmare	Default
Starting Funds	$250,000	$200,000	$150,000	$100,000	$80,000	$150,000
Horizon	1 year	1 year	1 year	1 year	1 year	3 years
Prestige Mode	Constant 1	Tri(1,4,1)	Tri(1,7,3)	Tri(1,8,4)	Tri(1,10,5)	Tri(1,10,4)
Domain Count	Constant 1	Constant 1	Tri(1,3,2)	Tri(1,3,2)	Tri(1,3,2)	Tri(1,3,2)
Required Qty	Tri(300,1200,600)	Tri(500,2000,1000)	Tri(700,3000,1500)	Tri(1000,4000,2000)	Tri(1200,5000,2500)	Tri(800,4000,2000)
Deadline (qty/day)	50	100	150	220	220	200
Fail Penalty	0.3×	0.8×	1.0×	1.4×	2.0×	1.4×
Cancel Penalty	0.5×	1.2×	1.5×	2.0×	2.5×	2.0×
Salary Bump %	0%	0.5%	1%	1%	2%	1%
Reward Scale	0.2	0.3	0.45	0.55	0.7	0.55

Notation: Tri(low, high, mode) = triangular distribution with given parameters. See Parameters for details.

What Each Preset Tests

Tutorial

Key Question: Can the agent execute the basic loop?

# From tutorial.toml
[world.dist.required_prestige]
type = "constant"
value = 1        # ALL tasks accessible immediately

[world.dist.domain_count]
type = "constant"
value = 1        # Single-domain only

deadline_qty_per_day = 50.0  # Very generous deadlines

Economics:

Starting runway: ~16 months with 10 employees
Monthly payroll: ~$15K
Mode task: 1 domain × 600 units, 7-day deadline
A single mid-tier employee can finish in 7.4 days

Tests:

Does the agent discover the CLI commands?
Does it call sim resume to advance time?
Can it read JSON output and act on it?

Easy

Key Question: Does the agent understand throughput dilution?

# From easy.toml
[world.dist.required_prestige]
type = "triangular"
low  = 1
high = 4
mode = 1        # Almost all tasks accessible at prestige-1

deadline_qty_per_day = 100.0  # Moderate deadlines

Economics:

Starting runway: ~7.8 months with 10 employees
Monthly payroll: ~$32K
Mode task: 1 domain × 1000 units, 10-day deadline
Team throughput on 1 task: 230 units/day → 3 days
On 4 parallel tasks: 57 units/day → 12 days (FAIL)

Tests:

Does the agent understand that parallel tasks split employee rates?
Does it keep ≤2 tasks active at a time?
Can it sequence tasks rather than batch?

Medium

Key Question: Can the agent climb the prestige ladder strategically?

# From medium.toml
[world.dist.required_prestige]
type = "triangular"
low  = 1
high = 7
mode = 3        # Most tasks need prestige 2–4

[world.dist.domain_count]
type = "triangular"
low  = 1
high = 3
mode = 2        # Most tasks need 2 domains

reward_prestige_scale = 0.45  # Climbing prestige doubles income

Economics:

Starting runway: ~7.8 months with 10 employees
Monthly payroll: ~$32K
Mode task: 2 domains × 1500 units, 10-day deadline
Prestige-1 reward: ~$30K
Prestige-4 reward: $30K × 2.35 =$ 70K

Tests:

Does the agent understand prestige gates market access?
Does it specialize in 2–3 domains rather than spreading thin?
Can it handle 2-domain task assignments effectively?

Hard

Key Question: Can the agent compute ETAs and never overcommit?

# From hard.toml
initial_funds_cents = 10_000_000  # $100,000 — tight runway

deadline_qty_per_day = 220.0      # Tight deadlines

penalty_fail_multiplier   = 1.4   # Mistakes cost real prestige
penalty_cancel_multiplier = 2.0

salary_bump_pct = 0.01            # Noticeable compounding

Economics:

Starting runway: ~5.4 months with 10 employees
Monthly payroll: ~$46K
Mode task: 2 domains × 2000 units, 9-day deadline
Split 4+3 employees: finishes in 8.7 days (just fits!)
Dispatching a second task splits all rates → both tasks miss

Tests:

Can the agent estimate completion time vs. deadline?
Does it understand that new dispatches degrade existing tasks?
Can it manage cash flow with 5.4-month runway?
Does it resist “tempting” high-reward tasks it can’t finish?

Nightmare

Key Question: Can the agent sustain perfect play for an entire year?

# From nightmare.toml
initial_funds_cents = 8_000_000   # $80,000 — razor-thin runway

[world.dist.required_prestige]
type = "triangular"
low  = 1
high = 10
mode = 5        # Most tasks need prestige 4–6

penalty_fail_multiplier   = 2.0   # Catastrophic penalties
penalty_cancel_multiplier = 2.5

salary_bump_pct = 0.02            # Aggressive compounding
reward_prestige_scale = 0.7       # Steep reward curve

Economics:

Starting runway: ~4.8 months with 10 employees
Monthly payroll: ~$52K initially, grows 30–50% over the year
Mode task: 2 domains × 2500 units, 11-day deadline
Revenue at prestige-1: ~ $30K (net: -$ 22K/month)
Revenue at prestige-5: ~$114K (now profitable)
The race: Climb to prestige 5 before month 5 or die

Tests:

Can the agent survive a 4.8-month clock to profitability?
Does it plan a prestige climb path across 2–3 domains?
Can it handle 3-domain assignments without throughput collapse?
Does it account for salary growth in long-term planning?
Can it resist every temptation to over-accept?

Specifying a Preset

Use the --config flag to specify a preset:

yc-bench run --config tutorial

yc-bench run --config nightmare

Available preset names:

tutorial
easy
medium
hard
nightmare
default (the 3-year hardened benchmark)

If no --config is specified, YC-Bench uses the default preset, which is the canonical 3-year benchmark configuration.

Preset Inheritance

All presets use extends = "default" and override only specific parameters. This means:

Every parameter not explicitly overridden inherits from default.toml
Parameters like num_employees, num_market_tasks, and salary tier distributions are consistent across presets
You can inspect the full effective configuration by examining both files

Example: The tutorial preset only overrides 13 parameters but inherits 50+ others from default.

Get Started

Core Concepts

Configuration

Development

Available Presets

Tutorial

Easy

Medium

Hard

Nightmare

Preset Comparison

What Each Preset Tests

Tutorial

Easy

Medium

Hard

Nightmare

Specifying a Preset

Preset Inheritance

Next Steps

Parameters

Tuning

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Development

Documentation Index

​Available Presets

Tutorial

Easy

Medium

Hard

Nightmare

​Preset Comparison

​What Each Preset Tests

​Tutorial

​Easy

​Medium

​Hard

​Nightmare

​Specifying a Preset

​Preset Inheritance

​Next Steps

Parameters

Tuning

Build docs developers (and LLMs) love

Available Presets

Preset Comparison

What Each Preset Tests

Tutorial

Easy

Medium

Hard

Nightmare

Specifying a Preset

Preset Inheritance

Next Steps