Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/skydiscover-ai/skydiscover/llms.txt

Use this file to discover all available pages before exploring further.

SkyDiscover is a modular framework for AI-driven scientific and algorithmic discovery. It provides a unified interface for implementing, running, and comparing discovery algorithms across diverse optimization tasks.

What is SkyDiscover?

SkyDiscover enables you to use large language models (LLMs) to automatically discover and optimize:
  • Algorithms: Sorting, scheduling, routing, packing problems
  • Mathematical solutions: Geometric optimization, inequality proofs
  • System configurations: GPU kernels, cloud scheduling, load balancing
  • Prompts: Optimizing LLM prompts for specific tasks
  • Creative content: AI image generation
SkyDiscover has been validated across 200+ optimization tasks, with its flagship algorithms AdaEvolve and EvoX achieving state-of-the-art results comparable to DeepMind’s AlphaEvolve.

Core Components

SkyDiscover consists of four primary components that work together:

1. Initial Program (Optional)

The starting point for optimization. Can be:
  • A baseline solution to improve upon
  • Omitted entirely (LLM generates from scratch)
  • Marked with EVOLVE-BLOCK markers to specify mutable regions

2. Search Algorithm

Determines which programs to evolve and how to evolve them. Options include:
  • AdaEvolve: Multi-island adaptive search with UCB selection
  • EvoX: Self-evolving search that co-adapts its own strategy
  • Top-K: Simple refinement of top-performing solutions
  • Beam Search: Breadth-first exploration of solution space
  • Best-of-N: Multiple variants from the same parent
See Search Algorithms for details.

3. Evaluator

A Python function that scores candidate programs:
def evaluate(program_path):
    score = run_and_grade(program_path)
    return {
        "combined_score": score,  # Primary optimization target
        "artifacts": {             # Optional feedback for LLM
            "feedback": "Off by one in loop boundary",
        },
    }
See Evaluators for examples.

4. LLM (Language Model)

Generates program mutations based on:
  • Parent program
  • Context programs (high-performing examples)
  • Evaluation feedback from previous attempts
  • Population statistics
Supports any LiteLLM-compatible model including OpenAI, Anthropic, Google, and local models.

The Discovery Loop

SkyDiscover runs this cycle for each iteration:
1

Sample

Search algorithm selects a parent program and context programs from the database
2

Prompt

Build prompts with parent code, context examples, feedback, and population stats
3

Generate

LLM creates a new program variant
4

Evaluate

Run the evaluator to score the program
5

Add

Store the program and metrics in the database
6

Adapt

Search algorithm updates its strategy based on results
This loop repeats for the configured number of iterations (typically 50-200).

Key Design Principles

Modularity

Every component is swappable:
  • Try different search algorithms without changing your problem
  • Use the same evaluator across multiple algorithms
  • Switch LLM providers seamlessly

Fairness

All algorithms run with:
  • Same evaluation budget
  • Same LLM calls per iteration
  • Standardized prompt templates
  • Reproducible checkpointing

Extensibility

Easy to add new:
  • Search algorithms (see skydiscover/search/README.md:29)
  • Benchmarks (see benchmarks/README.md)
  • Context builders for custom prompt strategies

What Makes SkyDiscover Different?

Adaptive Algorithms

AdaEvolve and EvoX dynamically adjust search intensity based on progress, unlike fixed strategies in other frameworks

200+ Benchmarks

Comprehensive evaluation across math, systems, algorithms, and reasoning tasks

Native Implementations

Built-in versions of OpenEvolve and GEPA for fair comparison without external dependencies

Real-time Monitoring

Live dashboard with scatter plots, code diffs, and human feedback integration

Performance Highlights

Across ~200 optimization benchmarks:
  • Frontier-CS: 34% median score improvement over OpenEvolve, GEPA, and ShinkaEvolve
  • Math + Systems: Matches or exceeds AlphaEvolve and human SOTA on 12/14 tasks
  • Real-world impact:
    • 41% lower cross-cloud transfer cost
    • 14% better GPU load balance for MoE serving
    • 29% lower KV-cache pressure via GPU model placement

Next Steps

Architecture

Deep dive into SkyDiscover’s internal architecture

Search Algorithms

Learn about available search algorithms

Evaluators

Write effective evaluation functions

Evolution Blocks

Control what code gets evolved

Build docs developers (and LLMs) love