Documentation Index Fetch the complete documentation index at: https://mintlify.com/skydiscover-ai/skydiscover/llms.txt
Use this file to discover all available pages before exploring further.
SkyDiscover is built on a clean layered architecture that separates concerns and enables extensibility.
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ Runner │
│ (Orchestrates lifecycle, checkpointing, monitoring) │
└────────────────┬────────────────────────────────┬────────────┘
│ │
┌────────▼────────┐ ┌────────▼─────────┐
│ Discovery │ │ Evaluator │
│ Controller │ │ │
└────────┬────────┘ └──────────────────┘
│
┌────────▼────────┐
│ Program │
│ Database │
└─────────────────┘
Core Classes
Runner (skydiscover/runner.py:24)
Top-level entry point that manages the complete discovery run:
class Runner :
def __init__ (
self ,
evaluation_file : str ,
initial_program_path : Optional[ str ] = None ,
config_path : Optional[ str ] = None ,
config : Optional[Config] = None ,
output_dir : Optional[ str ] = None ,
)
Responsibilities:
Load configuration and initial program
Create database and discovery controller
Run the discovery loop with checkpointing
Manage monitor server and human feedback
Save best program and generate reports
Key Methods:
Runner.run()
Initial Program Evaluation
async def run (
self ,
iterations : Optional[ int ] = None ,
checkpoint_path : Optional[ str ] = None ,
) -> Optional[Program]:
"""
Main discovery loop:
1. Initialize or resume from checkpoint
2. Add initial program if starting fresh
3. Run discovery_controller.run_discovery()
4. Save checkpoints and best program
"""
DiscoveryController (skydiscover/search/default_discovery_controller.py)
Executes the core sample → prompt → LLM → evaluate → add loop:
class DiscoveryController :
def __init__ ( self , controller_input : DiscoveryControllerInput):
self .config = controller_input.config
self .database = controller_input.database
self .evaluator = Evaluator(config.evaluator)
self .llms = LLMManager(config.llm)
self .prompt_builder = PromptBuilder(config.prompt)
Key Loop:
async def run_discovery (
self ,
start_iteration : int ,
max_iterations : int ,
checkpoint_callback = None ,
):
for iteration in range (start_iteration, start_iteration + max_iterations):
if self .shutdown_event.is_set():
break
# Core iteration
result = await self ._run_iteration(iteration)
if result.error:
continue
# Store to database
self ._process_iteration_result(
result, iteration, checkpoint_callback
)
Single Iteration (_run_iteration):
Sample
Call database.sample() to get parent and context programs
Build Prompts
Use PromptBuilder to create system/user messages with code and feedback
Generate
Call llms.generate() to get new program from LLM
Evaluate
Run evaluator.evaluate_program() to score the program
Return Result
Package program, metrics, and parent info into SerializableResult
Advanced algorithms can override run_discovery() to implement:
Acceptance gating (GEPA)
Island migration (AdaEvolve)
Strategy co-evolution (EvoX)
ProgramDatabase (skydiscover/search/base_database.py:75)
Abstract base class for program storage and sampling:
class ProgramDatabase ( ABC ):
def __init__ ( self , name : str , config : DatabaseConfig, ** kwargs ):
self .programs: Dict[ str , Program] = {} # All programs
self .best_program_id: Optional[ str ] = None
self .last_iteration: int = 0
@abstractmethod
def add ( self , program : Program, iteration : Optional[ int ] = None ) -> str :
"""Store a program and update best tracking."""
...
@abstractmethod
def sample (
self , num_context_programs : Optional[ int ] = 4
) -> Tuple[Program, List[Program]]:
"""Select parent and context programs for next iteration."""
...
Provided Methods:
Method Purpose get_best_program()Return highest-scoring program get_top_programs(n)Return top N by score _update_best_program(program)Update best tracking (call in add()) save(path, iteration)Checkpoint to disk load(path)Restore from checkpoint get_statistics()Return population stats for prompts log_prompt(...)Store prompt and response for debugging
Evaluator (skydiscover/evaluation/evaluator.py:21)
Runs user-provided evaluation functions with timeout and retry:
class Evaluator :
def __init__ (
self ,
config : EvaluatorConfig,
llm_judge : Optional[LLMJudge] = None ,
max_concurrent : int = 4 ,
):
self .evaluation_file = config.evaluation_file
self .program_suffix = config.file_suffix
self ._load_evaluation_function()
Evaluation Flow:
async def evaluate_program (
self , program_solution : str , program_id : str = ""
) -> EvaluationResult:
# 1. Write program to temp file
with tempfile.NamedTemporaryFile( suffix = self .program_suffix) as f:
f.write(program_solution.encode( "utf-8" ))
temp_path = f.name
# 2. Run evaluation function with timeout
result = await asyncio.wait_for(
loop.run_in_executor( None , self .evaluate_function, temp_path),
timeout = self .config.timeout,
)
# 3. Normalize to EvaluationResult
eval_result = self ._normalize_result(result)
# 4. Optional: add LLM judge feedback
if self .llm_judge:
llm_result = await self .llm_judge.evaluate(program_solution)
eval_result.metrics.update(llm_result.metrics)
return eval_result
Cascade Evaluation:
For expensive evaluations, use two-stage cascade:
Stage 1 : Fast validation (e.g., syntax check, basic tests)
Threshold Check : Only proceed if stage 1 score exceeds threshold
Stage 2 : Full evaluation (e.g., comprehensive benchmarks)
Define evaluate_stage1() and evaluate_stage2() in your evaluator file.
Data Flow
Program Object
The Program dataclass (skydiscover/search/base_database.py:23) carries all information about a candidate:
@dataclass
class Program :
# Identity
id : str # UUID
solution: str # Source code or prompt text
language: str = "python"
# Performance
metrics: Dict[ str , Any] # {"combined_score": float, ...}
# Lineage
iteration_found: int = 0
parent_id: Optional[ str ] = None
other_context_ids: Optional[List[ str ]] = None
# Metadata
artifacts: Dict[ str , Any] # Feedback for LLM
metadata: Dict[ str , Any] # Algorithm-specific data
prompts: Optional[Dict] # Prompt used to generate
generation: int = 0 # Distance from initial program
timestamp: float
Evaluation Result
The evaluator returns metrics and optional artifacts:
@dataclass
class EvaluationResult :
metrics: Dict[ str , Any] # Numeric scores
artifacts: Dict[ str , Any] # Feedback text, error messages, etc.
Metrics:
combined_score: Primary optimization target (required)
Any additional numeric metrics (e.g., accuracy, latency, cost)
Artifacts:
Error messages or debugging info
Textual feedback injected into next LLM prompt
Test results or performance breakdowns
Configuration
SkyDiscover uses a typed configuration system:
max_iterations : 100
llm :
models :
- name : "gpt-4"
weight : 1.0
search :
type : "adaevolve"
database :
num_islands : 5
migration_interval : 10
evaluator :
timeout : 300
cascade_evaluation : true
cascade_thresholds : [ 0.7 ]
prompt :
system_message : "You are an expert at optimizing algorithms."
monitor :
enabled : true
port : 8080
See Configuration Guide for all options.
Registry System
Search algorithms are registered at import time in skydiscover/search/route.py:36:
# Simple algorithms (use default controller)
register_database( "topk" , TopKDatabase)
register_database( "best_of_n" , BestOfNDatabase)
register_database( "beam_search" , BeamSearchDatabase)
# Advanced algorithms (custom controller)
register_database( "adaevolve" , AdaEvolveDatabase)
register_controller( "adaevolve" , AdaEvolveController)
register_database( "gepa_native" , GEPANativeDatabase)
register_controller( "gepa_native" , GEPANativeController)
register_controller( "evox" , CoEvolutionController)
register_database( "evox_meta" , SearchStrategyDatabase)
The --search flag maps to these registrations at runtime.
Checkpointing
Checkpoints save full state for resumption:
checkpoints/
checkpoint_50/
programs.json # All programs with metrics
best_program.py # Best solution code
best_program_info.json # Metadata about best
prompts/ # Optional prompt logs
<program_id>.json
Resume with:
skydiscover-run evaluator.py --checkpoint checkpoints/checkpoint_50
Monitoring
The optional monitor provides real-time visibility:
# In Runner._start_monitor()
from skydiscover.extras.monitor import MonitorServer
server = MonitorServer( host = "0.0.0.0" , port = 8080 )
server.start()
# Callback pushes programs to frontend
def monitor_callback ( program : Program, iteration : int ):
server.push_event({
"type" : "program_added" ,
"program_id" : program.id,
"score" : program.metrics.get( "combined_score" ),
"iteration" : iteration,
})
Access at http://localhost:8080/ during runs.
Extension Points
Custom Search Algorithm Subclass ProgramDatabase and implement add() + sample()
See skydiscover/search/README.md:17
Custom Controller Subclass DiscoveryController and override run_discovery()
See skydiscover/search/README.md:80
Custom Context Builder Control prompt construction logic
See skydiscover/context_builder/README.md
Custom Evaluator Define evaluate(program_path) function
See Evaluators
Search Algorithms Learn about available algorithms
Evaluators Write effective evaluation functions