Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/skydiscover-ai/skydiscover/llms.txt

Use this file to discover all available pages before exploring further.

SkyDiscover is built on a clean layered architecture that separates concerns and enables extensibility.

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Runner                               │
│  (Orchestrates lifecycle, checkpointing, monitoring)         │
└────────────────┬────────────────────────────────┬────────────┘
                 │                                │
        ┌────────▼────────┐              ┌────────▼─────────┐
        │ Discovery       │              │   Evaluator      │
        │ Controller      │              │                  │
        └────────┬────────┘              └──────────────────┘

        ┌────────▼────────┐
        │ Program         │
        │ Database        │
        └─────────────────┘

Core Classes

Runner (skydiscover/runner.py:24)

Top-level entry point that manages the complete discovery run:
class Runner:
    def __init__(
        self,
        evaluation_file: str,
        initial_program_path: Optional[str] = None,
        config_path: Optional[str] = None,
        config: Optional[Config] = None,
        output_dir: Optional[str] = None,
    )
Responsibilities:
  • Load configuration and initial program
  • Create database and discovery controller
  • Run the discovery loop with checkpointing
  • Manage monitor server and human feedback
  • Save best program and generate reports
Key Methods:
async def run(
    self,
    iterations: Optional[int] = None,
    checkpoint_path: Optional[str] = None,
) -> Optional[Program]:
    """
    Main discovery loop:
    1. Initialize or resume from checkpoint
    2. Add initial program if starting fresh
    3. Run discovery_controller.run_discovery()
    4. Save checkpoints and best program
    """

DiscoveryController (skydiscover/search/default_discovery_controller.py)

Executes the core sample → prompt → LLM → evaluate → add loop:
class DiscoveryController:
    def __init__(self, controller_input: DiscoveryControllerInput):
        self.config = controller_input.config
        self.database = controller_input.database
        self.evaluator = Evaluator(config.evaluator)
        self.llms = LLMManager(config.llm)
        self.prompt_builder = PromptBuilder(config.prompt)
Key Loop:
async def run_discovery(
    self,
    start_iteration: int,
    max_iterations: int,
    checkpoint_callback=None,
):
    for iteration in range(start_iteration, start_iteration + max_iterations):
        if self.shutdown_event.is_set():
            break

        # Core iteration
        result = await self._run_iteration(iteration)

        if result.error:
            continue

        # Store to database
        self._process_iteration_result(
            result, iteration, checkpoint_callback
        )
Single Iteration (_run_iteration):
1

Sample

Call database.sample() to get parent and context programs
2

Build Prompts

Use PromptBuilder to create system/user messages with code and feedback
3

Generate

Call llms.generate() to get new program from LLM
4

Evaluate

Run evaluator.evaluate_program() to score the program
5

Return Result

Package program, metrics, and parent info into SerializableResult
Advanced algorithms can override run_discovery() to implement:
  • Acceptance gating (GEPA)
  • Island migration (AdaEvolve)
  • Strategy co-evolution (EvoX)

ProgramDatabase (skydiscover/search/base_database.py:75)

Abstract base class for program storage and sampling:
class ProgramDatabase(ABC):
    def __init__(self, name: str, config: DatabaseConfig, **kwargs):
        self.programs: Dict[str, Program] = {}  # All programs
        self.best_program_id: Optional[str] = None
        self.last_iteration: int = 0

    @abstractmethod
    def add(self, program: Program, iteration: Optional[int] = None) -> str:
        """Store a program and update best tracking."""
        ...

    @abstractmethod
    def sample(
        self, num_context_programs: Optional[int] = 4
    ) -> Tuple[Program, List[Program]]:
        """Select parent and context programs for next iteration."""
        ...
Provided Methods:
MethodPurpose
get_best_program()Return highest-scoring program
get_top_programs(n)Return top N by score
_update_best_program(program)Update best tracking (call in add())
save(path, iteration)Checkpoint to disk
load(path)Restore from checkpoint
get_statistics()Return population stats for prompts
log_prompt(...)Store prompt and response for debugging

Evaluator (skydiscover/evaluation/evaluator.py:21)

Runs user-provided evaluation functions with timeout and retry:
class Evaluator:
    def __init__(
        self,
        config: EvaluatorConfig,
        llm_judge: Optional[LLMJudge] = None,
        max_concurrent: int = 4,
    ):
        self.evaluation_file = config.evaluation_file
        self.program_suffix = config.file_suffix
        self._load_evaluation_function()
Evaluation Flow:
async def evaluate_program(
    self, program_solution: str, program_id: str = ""
) -> EvaluationResult:
    # 1. Write program to temp file
    with tempfile.NamedTemporaryFile(suffix=self.program_suffix) as f:
        f.write(program_solution.encode("utf-8"))
        temp_path = f.name

    # 2. Run evaluation function with timeout
    result = await asyncio.wait_for(
        loop.run_in_executor(None, self.evaluate_function, temp_path),
        timeout=self.config.timeout,
    )

    # 3. Normalize to EvaluationResult
    eval_result = self._normalize_result(result)

    # 4. Optional: add LLM judge feedback
    if self.llm_judge:
        llm_result = await self.llm_judge.evaluate(program_solution)
        eval_result.metrics.update(llm_result.metrics)

    return eval_result
Cascade Evaluation: For expensive evaluations, use two-stage cascade:
  1. Stage 1: Fast validation (e.g., syntax check, basic tests)
  2. Threshold Check: Only proceed if stage 1 score exceeds threshold
  3. Stage 2: Full evaluation (e.g., comprehensive benchmarks)
Define evaluate_stage1() and evaluate_stage2() in your evaluator file.

Data Flow

Program Object

The Program dataclass (skydiscover/search/base_database.py:23) carries all information about a candidate:
@dataclass
class Program:
    # Identity
    id: str                          # UUID
    solution: str                    # Source code or prompt text
    language: str = "python"

    # Performance
    metrics: Dict[str, Any]          # {"combined_score": float, ...}

    # Lineage
    iteration_found: int = 0
    parent_id: Optional[str] = None
    other_context_ids: Optional[List[str]] = None

    # Metadata
    artifacts: Dict[str, Any]        # Feedback for LLM
    metadata: Dict[str, Any]         # Algorithm-specific data
    prompts: Optional[Dict]          # Prompt used to generate
    generation: int = 0              # Distance from initial program
    timestamp: float

Evaluation Result

The evaluator returns metrics and optional artifacts:
@dataclass
class EvaluationResult:
    metrics: Dict[str, Any]    # Numeric scores
    artifacts: Dict[str, Any]  # Feedback text, error messages, etc.
Metrics:
  • combined_score: Primary optimization target (required)
  • Any additional numeric metrics (e.g., accuracy, latency, cost)
Artifacts:
  • Error messages or debugging info
  • Textual feedback injected into next LLM prompt
  • Test results or performance breakdowns

Configuration

SkyDiscover uses a typed configuration system:
max_iterations: 100

llm:
  models:
    - name: "gpt-4"
      weight: 1.0

search:
  type: "adaevolve"
  database:
    num_islands: 5
    migration_interval: 10

evaluator:
  timeout: 300
  cascade_evaluation: true
  cascade_thresholds: [0.7]

prompt:
  system_message: "You are an expert at optimizing algorithms."

monitor:
  enabled: true
  port: 8080
See Configuration Guide for all options.

Registry System

Search algorithms are registered at import time in skydiscover/search/route.py:36:
# Simple algorithms (use default controller)
register_database("topk", TopKDatabase)
register_database("best_of_n", BestOfNDatabase)
register_database("beam_search", BeamSearchDatabase)

# Advanced algorithms (custom controller)
register_database("adaevolve", AdaEvolveDatabase)
register_controller("adaevolve", AdaEvolveController)

register_database("gepa_native", GEPANativeDatabase)
register_controller("gepa_native", GEPANativeController)

register_controller("evox", CoEvolutionController)
register_database("evox_meta", SearchStrategyDatabase)
The --search flag maps to these registrations at runtime.

Checkpointing

Checkpoints save full state for resumption:
checkpoints/
  checkpoint_50/
    programs.json          # All programs with metrics
    best_program.py        # Best solution code
    best_program_info.json # Metadata about best
    prompts/               # Optional prompt logs
      <program_id>.json
Resume with:
skydiscover-run evaluator.py --checkpoint checkpoints/checkpoint_50

Monitoring

The optional monitor provides real-time visibility:
# In Runner._start_monitor()
from skydiscover.extras.monitor import MonitorServer

server = MonitorServer(host="0.0.0.0", port=8080)
server.start()

# Callback pushes programs to frontend
def monitor_callback(program: Program, iteration: int):
    server.push_event({
        "type": "program_added",
        "program_id": program.id,
        "score": program.metrics.get("combined_score"),
        "iteration": iteration,
    })
Access at http://localhost:8080/ during runs.

Extension Points

Custom Search Algorithm

Subclass ProgramDatabase and implement add() + sample()
See skydiscover/search/README.md:17

Custom Controller

Subclass DiscoveryController and override run_discovery()
See skydiscover/search/README.md:80

Custom Context Builder

Control prompt construction logic
See skydiscover/context_builder/README.md

Custom Evaluator

Define evaluate(program_path) function
See Evaluators

Search Algorithms

Learn about available algorithms

Evaluators

Write effective evaluation functions

Build docs developers (and LLMs) love