YC-Bench includes five carefully calibrated presets that test progressively harder scenarios. Each preset adjusts dozens of parameters to create specific challenges for LLM agents.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/collinear-ai/yc-bench/llms.txt
Use this file to discover all available pages before exploring further.
Available Presets
Tutorial
1-year horizon • Tests basic loop executionForgiving environment for testing basic CLI discovery and the accept → assign → dispatch → resume loop.
Easy
1-year horizon • Tests throughput awarenessSingle-domain tasks with moderate deadlines. Tests whether agents understand that parallelism dilutes throughput.
Medium
1-year horizon • Tests domain specializationPrestige ladder active, 2-domain tasks. Agents must specialize in a few domains to unlock higher-reward tiers.
Hard
1-year horizon • Tests capacity planningTight deadlines, heavy penalties, limited runway. Requires precise ETA calculation and conservative task acceptance.
Nightmare
1-year horizon • Tests sustained perfect playRazor-thin margins, aggressive compounding, steep prestige requirements. One mistake cascades into bankruptcy.
Preset Comparison
The table below highlights the key differences between presets:| Parameter | Tutorial | Easy | Medium | Hard | Nightmare | Default |
|---|---|---|---|---|---|---|
| Starting Funds | $250,000 | $200,000 | $150,000 | $100,000 | $80,000 | $150,000 |
| Horizon | 1 year | 1 year | 1 year | 1 year | 1 year | 3 years |
| Prestige Mode | Constant 1 | Tri(1,4,1) | Tri(1,7,3) | Tri(1,8,4) | Tri(1,10,5) | Tri(1,10,4) |
| Domain Count | Constant 1 | Constant 1 | Tri(1,3,2) | Tri(1,3,2) | Tri(1,3,2) | Tri(1,3,2) |
| Required Qty | Tri(300,1200,600) | Tri(500,2000,1000) | Tri(700,3000,1500) | Tri(1000,4000,2000) | Tri(1200,5000,2500) | Tri(800,4000,2000) |
| Deadline (qty/day) | 50 | 100 | 150 | 220 | 220 | 200 |
| Fail Penalty | 0.3× | 0.8× | 1.0× | 1.4× | 2.0× | 1.4× |
| Cancel Penalty | 0.5× | 1.2× | 1.5× | 2.0× | 2.5× | 2.0× |
| Salary Bump % | 0% | 0.5% | 1% | 1% | 2% | 1% |
| Reward Scale | 0.2 | 0.3 | 0.45 | 0.55 | 0.7 | 0.55 |
Notation:
Tri(low, high, mode) = triangular distribution with given parameters. See Parameters for details.What Each Preset Tests
Tutorial
Key Question: Can the agent execute the basic loop?- Starting runway: ~16 months with 10 employees
- Monthly payroll: ~$15K
- Mode task: 1 domain × 600 units, 7-day deadline
- A single mid-tier employee can finish in 7.4 days
- Does the agent discover the CLI commands?
- Does it call
sim resumeto advance time? - Can it read JSON output and act on it?
Easy
Key Question: Does the agent understand throughput dilution?- Starting runway: ~7.8 months with 10 employees
- Monthly payroll: ~$32K
- Mode task: 1 domain × 1000 units, 10-day deadline
- Team throughput on 1 task: 230 units/day → 3 days
- On 4 parallel tasks: 57 units/day → 12 days (FAIL)
- Does the agent understand that parallel tasks split employee rates?
- Does it keep ≤2 tasks active at a time?
- Can it sequence tasks rather than batch?
Medium
Key Question: Can the agent climb the prestige ladder strategically?- Starting runway: ~7.8 months with 10 employees
- Monthly payroll: ~$32K
- Mode task: 2 domains × 1500 units, 10-day deadline
- Prestige-1 reward: ~$30K
- Prestige-4 reward: 70K
- Does the agent understand prestige gates market access?
- Does it specialize in 2–3 domains rather than spreading thin?
- Can it handle 2-domain task assignments effectively?
Hard
Key Question: Can the agent compute ETAs and never overcommit?- Starting runway: ~5.4 months with 10 employees
- Monthly payroll: ~$46K
- Mode task: 2 domains × 2000 units, 9-day deadline
- Split 4+3 employees: finishes in 8.7 days (just fits!)
- Dispatching a second task splits all rates → both tasks miss
- Can the agent estimate completion time vs. deadline?
- Does it understand that new dispatches degrade existing tasks?
- Can it manage cash flow with 5.4-month runway?
- Does it resist “tempting” high-reward tasks it can’t finish?
Nightmare
Key Question: Can the agent sustain perfect play for an entire year?- Starting runway: ~4.8 months with 10 employees
- Monthly payroll: ~$52K initially, grows 30–50% over the year
- Mode task: 2 domains × 2500 units, 11-day deadline
- Revenue at prestige-1: ~22K/month)
- Revenue at prestige-5: ~$114K (now profitable)
- The race: Climb to prestige 5 before month 5 or die
- Can the agent survive a 4.8-month clock to profitability?
- Does it plan a prestige climb path across 2–3 domains?
- Can it handle 3-domain assignments without throughput collapse?
- Does it account for salary growth in long-term planning?
- Can it resist every temptation to over-accept?
Specifying a Preset
Use the--config flag to specify a preset:
tutorialeasymediumhardnightmaredefault(the 3-year hardened benchmark)
Preset Inheritance
All presets useextends = "default" and override only specific parameters. This means:
- Every parameter not explicitly overridden inherits from
default.toml - Parameters like
num_employees,num_market_tasks, and salary tier distributions are consistent across presets - You can inspect the full effective configuration by examining both files
tutorial preset only overrides 13 parameters but inherits 50+ others from default.
Next Steps
Parameters
Complete reference of all tunable parameters in default.toml
Tuning
Learn how to create your own custom presets