HypothesisGenerator

Overview

The HypothesisGenerator analyzes experiment results and generates ranked, testable hypotheses to guide the next iteration of the experiment loop. It balances exploration and exploitation strategies and provides confidence scores for each hypothesis.

Key Features

Generates 2-3 ranked, testable hypotheses from analysis results
Balances exploration (new approaches) vs exploitation (refining what works)
Provides confidence scores (0.0-1.0) and priorities (1-3)
Suggests concrete model choices and hyperparameters
Maintains conversation context for Thought Signature continuity
Intelligent fallback when Gemini is unavailable

Class Definition

class HypothesisGenerator:
    """Generates ranked hypotheses for the next experiment iteration using Gemini 3.

    Key features:
    - Generates 2-3 testable hypotheses from analysis results
    - Balances exploration vs exploitation strategies
    - Provides confidence scores and priorities
    - Suggests concrete model choices and parameters
    - Maintains conversation context for Thought Signature continuity
    """

    def __init__(self, gemini_client: GeminiClient):
        """Initialize the hypothesis generator.

        Args:
            gemini_client: Shared GeminiClient instance for API calls.
        """

Constructor

gemini_client

GeminiClient

required

Shared GeminiClient instance for API calls. Sharing the same client across cognitive components preserves conversation history.

Methods

generate

Generate hypotheses for the next experiment iteration.

def generate(
    self,
    analysis: AnalysisResult,
    state: ExperimentState,
) -> HypothesisSet:
    """Generate hypotheses for the next experiment iteration.

    Args:
        analysis: Analysis of the most recent experiment.
        state: Current experiment state with history.

    Returns:
        HypothesisSet with ranked hypotheses.
    """

Parameters

analysis

AnalysisResult

required

Analysis result from the most recent experiment, containing:

experiment_name (str): Name of analyzed experiment
iteration (int): Iteration number
success (bool): Whether experiment succeeded
primary_metric (Optional[MetricComparison]): Metric comparison data
trend_pattern (TrendPattern): Detected performance trend
key_observations (list[str]): Actionable insights
reasoning (str): Analysis reasoning

state

ExperimentState

required

Current experiment state containing:

experiments (list[ExperimentResult]): Historical results
config (Config): Configuration including task type, constraints, max iterations
current_iteration (int): Current iteration number
iterations_without_improvement (int): Plateau counter

Returns

HypothesisSet

Set of ranked hypotheses containing:

iteration (int): Iteration number these hypotheses are for
analysis_summary (str): Brief summary of what the analysis tells us
hypotheses (list[Hypothesis]): 2-3 ranked hypotheses (see below)
exploration_vs_exploitation (str): Strategy - "explore", "exploit", or "balanced"
reasoning (str): Overall reasoning for these hypotheses

get_generation_count

Get the number of hypothesis generations performed.

def get_generation_count(self) -> int:
    """Get the number of hypothesis generations performed."""

Returns

count

int

Total number of hypothesis sets generated by this instance.

Data Structures

Hypothesis

A single testable hypothesis with suggested implementation.

@dataclass
class Hypothesis:
    hypothesis_id: str
    statement: str
    rationale: str
    suggested_model: Optional[str]
    suggested_params: dict
    confidence_score: float  # 0.0-1.0
    priority: int  # 1-3, where 1 is highest

Fields:

hypothesis_id

str

Unique identifier for the hypothesis (e.g., “h1”, “h2”).

statement

str

Clear, testable hypothesis statement describing what to test.

rationale

str

Explanation of why this hypothesis is worth testing based on the analysis.

suggested_model

Optional[str]

Recommended model class name (e.g., “RandomForestRegressor”).

suggested_params

dict

Recommended hyperparameters for the suggested model.

confidence_score

float

Confidence in this hypothesis (0.0-1.0):

0.8-1.0: Strong evidence supports this direction
0.5-0.7: Moderate evidence, reasonable next step
0.2-0.4: Speculative but potentially high-value

priority

int

Priority ranking (1-3):

1: Highest priority - implement this first
2: Secondary option
3: Exploratory/backup option

HypothesisSet

@dataclass
class HypothesisSet:
    iteration: int
    analysis_summary: str
    hypotheses: list[Hypothesis]
    exploration_vs_exploitation: str  # "explore" | "exploit" | "balanced"
    reasoning: str

Exploration vs Exploitation Strategy

The generator automatically selects a strategy based on the analysis:

Explore

Try fundamentally different approaches when:

Performance has plateaued for multiple iterations
Current iteration is > 70% of max_iterations
Trend pattern indicates a local optimum

Exploit

Refine promising approaches when:

Trend pattern shows consistent improvement
Early iterations (≤5) with positive results
Recent experiments show clear winning direction

Balanced

Mix both strategies (default) for:

Normal iteration flow
Mixed results across iterations
Moderate performance improvements

Confidence Scoring Guidelines

Range	Interpretation	Example
0.8-1.0	Strong evidence from multiple iterations	”XGBoost improved RMSE by 15% in last 2 trials”
0.5-0.7	Moderate evidence, reasonable step	”Gradient boosting hasn’t been tried yet”
0.2-0.4	Speculative, potentially high-value	”Neural network might capture non-linearities”

System Prompt

The generator uses a comprehensive system prompt that guides Gemini to:

Generate 2-3 ranked hypotheses based on analysis
Make each hypothesis specific and testable in a single experiment
Balance exploration and exploitation
Reference specific metric values and patterns
Suggest concrete model choices and hyperparameters
Include at least one “safe” refinement and one “exploratory” option

Usage Examples

Basic Hypothesis Generation

from src.cognitive.gemini_client import GeminiClient
from src.cognitive.hypothesis_generator import HypothesisGenerator

# Initialize
client = GeminiClient()
generator = HypothesisGenerator(gemini_client=client)

# Generate hypotheses from analysis
hypotheses = generator.generate(
    analysis=analysis_result,
    state=experiment_state
)

print(f"Strategy: {hypotheses.exploration_vs_exploitation}")
print(f"\nHypotheses ({len(hypotheses.hypotheses)}):")
for h in hypotheses.hypotheses:
    print(f"\n[{h.priority}] {h.hypothesis_id} (confidence: {h.confidence_score:.2f})")
    print(f"  Statement: {h.statement}")
    print(f"  Model: {h.suggested_model}")
    print(f"  Params: {h.suggested_params}")

Selecting Top Hypothesis

hypotheses = generator.generate(analysis, state)

# Get highest priority hypothesis
top_hypothesis = min(hypotheses.hypotheses, key=lambda h: h.priority)

print(f"Top hypothesis: {top_hypothesis.statement}")
print(f"Rationale: {top_hypothesis.rationale}")
print(f"Suggested model: {top_hypothesis.suggested_model}")
print(f"Confidence: {top_hypothesis.confidence_score:.2%}")

# Use in experiment design
if top_hypothesis.suggested_model:
    next_spec = designer.design_experiment(
        data_profile=state.data_profile,
        previous_results=state.experiments,
        task_type=state.config.task_type.value,
        iteration=state.current_iteration + 1
    )

Strategy-Based Selection

hypotheses = generator.generate(analysis, state)

if hypotheses.exploration_vs_exploitation == "explore":
    print("Exploration mode: trying diverse approaches")
    # Pick the most speculative hypothesis
    exploratory = max(hypotheses.hypotheses, key=lambda h: h.priority)
    print(f"Trying: {exploratory.statement}")
    
elif hypotheses.exploration_vs_exploitation == "exploit":
    print("Exploitation mode: refining best approach")
    # Pick the highest confidence hypothesis
    safe_bet = max(hypotheses.hypotheses, key=lambda h: h.confidence_score)
    print(f"Refining: {safe_bet.statement}")
    
else:
    print("Balanced mode: trying top-ranked hypothesis")
    top = min(hypotheses.hypotheses, key=lambda h: h.priority)
    print(f"Next: {top.statement}")

Integration with Full Pipeline

from src.cognitive.gemini_client import GeminiClient
from src.cognitive.experiment_designer import ExperimentDesigner
from src.cognitive.results_analyzer import ResultsAnalyzer
from src.cognitive.hypothesis_generator import HypothesisGenerator

# Shared client for conversation continuity
client = GeminiClient()

# Initialize cognitive components
designer = ExperimentDesigner(client)
analyzer = ResultsAnalyzer(client)
generator = HypothesisGenerator(client)

# Experiment loop
for iteration in range(1, state.config.max_iterations + 1):
    # Design experiment
    spec = designer.design_experiment(
        data_profile=state.data_profile,
        previous_results=state.experiments,
        task_type=state.config.task_type.value,
        iteration=iteration
    )
    
    # Execute and analyze
    result = execute_experiment(spec)
    analysis = analyzer.analyze(result, state)
    
    # Generate hypotheses for next iteration
    hypotheses = generator.generate(analysis, state)
    
    # Log insights
    print(f"\n=== Iteration {iteration} ===")
    print(f"Analysis trend: {analysis.trend_pattern.value}")
    print(f"Next strategy: {hypotheses.exploration_vs_exploitation}")
    print(f"Top hypothesis: {hypotheses.hypotheses[0].statement}")
    
    # Update state
    state.add_experiment(result)
    state.add_analysis(analysis)
    state.add_hypotheses(hypotheses)
    
    # Early stopping based on strategy
    if hypotheses.exploration_vs_exploitation == "explore":
        if state.iterations_without_improvement >= 5:
            print("Stopping: Plateau detected")
            break

print(f"\nGenerated {generator.get_generation_count()} hypothesis sets")

Handling Failed Experiments

# When experiment fails, generator provides safe retry hypothesis
failed_analysis = AnalysisResult(
    experiment_name="failed_xgb",
    iteration=3,
    success=False,
    # ... other fields
)

hypotheses = generator.generate(failed_analysis, state)

# Returns a simple, robust hypothesis
print(f"Hypotheses: {len(hypotheses.hypotheses)}")
print(f"Statement: {hypotheses.hypotheses[0].statement}")
print(f"Reasoning: {hypotheses.reasoning}")
# Output: "Previous experiment failed. Falling back to a robust default."

Custom Hypothesis Filtering

hypotheses = generator.generate(analysis, state)

# Filter by confidence threshold
high_confidence = [
    h for h in hypotheses.hypotheses
    if h.confidence_score >= 0.7
]

print(f"High-confidence hypotheses: {len(high_confidence)}")
for h in high_confidence:
    print(f"  - {h.statement} ({h.confidence_score:.2f})")

# Filter by model type preference
tree_based = [
    h for h in hypotheses.hypotheses
    if h.suggested_model and 'Forest' in h.suggested_model
]

print(f"\nTree-based suggestions: {len(tree_based)}")

Tracking Hypothesis History

class HypothesisTracker:
    def __init__(self):
        self.history = []
    
    def track(self, hypotheses: HypothesisSet, chosen_idx: int = 0):
        self.history.append({
            'iteration': hypotheses.iteration,
            'strategy': hypotheses.exploration_vs_exploitation,
            'chosen': hypotheses.hypotheses[chosen_idx].statement,
            'confidence': hypotheses.hypotheses[chosen_idx].confidence_score,
        })
    
    def summary(self):
        print("Hypothesis History:")
        for entry in self.history:
            print(f"  Iter {entry['iteration']}: {entry['chosen']}")
            print(f"    Strategy: {entry['strategy']}, Confidence: {entry['confidence']:.2f}")

# Usage
tracker = HypothesisTracker()

for iteration in range(1, 6):
    # ... execute, analyze
    hypotheses = generator.generate(analysis, state)
    tracker.track(hypotheses, chosen_idx=0)

tracker.summary()

Fallback Behavior

# When Gemini fails, generator provides deterministic fallbacks
try:
    hypotheses = generator.generate(analysis, state)
    
    # Check if fallback was used
    if "fallback" in hypotheses.analysis_summary.lower():
        print("Using fallback hypotheses (Gemini unavailable)")
        print(f"Reasoning: {hypotheses.reasoning}")
        
        # Fallback rotates through model types based on iteration
        for h in hypotheses.hypotheses:
            print(f"  - {h.suggested_model}: {h.statement}")
except Exception as e:
    print(f"Hypothesis generation failed: {e}")

Integration Pattern

The HypothesisGenerator fits between ResultsAnalyzer and ExperimentDesigner in the cognitive loop:

ExperimentDesigner → Design experiment spec
CodeGenerator → Generate and execute code
ResultsAnalyzer → Analyze experiment results
HypothesisGenerator → Generate next hypotheses  ← YOU ARE HERE
(Back to ExperimentDesigner with new insights)

The hypotheses inform the next design iteration by:

Suggesting specific model types and parameters
Indicating whether to explore or exploit
Providing confidence-weighted options
Connecting choices to observed patterns

Cognitive Components

Execution Layer

Orchestration

Persistence

Overview

Key Features

Class Definition

Constructor

Methods

generate

Parameters

Returns

get_generation_count

Returns

Data Structures

Hypothesis

HypothesisSet

Exploration vs Exploitation Strategy

Explore

Exploit

Balanced

Confidence Scoring Guidelines

System Prompt

Usage Examples

Basic Hypothesis Generation

Selecting Top Hypothesis

Strategy-Based Selection

Integration with Full Pipeline

Handling Failed Experiments

Custom Hypothesis Filtering

Tracking Hypothesis History

Fallback Behavior

Integration Pattern

See Also

Build docs developers (and LLMs) love

Cognitive Components

Execution Layer

Orchestration

Persistence

​Overview

​Key Features

​Class Definition

​Constructor

​Methods

​generate

​Parameters

​Returns

​get_generation_count

​Returns

​Data Structures

​Hypothesis

​HypothesisSet

​Exploration vs Exploitation Strategy

​Explore

​Exploit

​Balanced

​Confidence Scoring Guidelines

​System Prompt

​Usage Examples

​Basic Hypothesis Generation

​Selecting Top Hypothesis

​Strategy-Based Selection

​Integration with Full Pipeline

​Handling Failed Experiments

​Custom Hypothesis Filtering

​Tracking Hypothesis History

​Fallback Behavior

​Integration Pattern

​See Also

Build docs developers (and LLMs) love

Overview

Key Features

Class Definition

Constructor

Methods

generate

Parameters

Returns

get_generation_count

Returns

Data Structures

Hypothesis

HypothesisSet

Exploration vs Exploitation Strategy

Explore

Exploit

Balanced

Confidence Scoring Guidelines

System Prompt

Usage Examples

Basic Hypothesis Generation

Selecting Top Hypothesis

Strategy-Based Selection

Integration with Full Pipeline

Handling Failed Experiments

Custom Hypothesis Filtering

Tracking Hypothesis History

Fallback Behavior

Integration Pattern

See Also