Skip to main content

Overview

The HypothesisGenerator analyzes experiment results and generates ranked, testable hypotheses to guide the next iteration of the experiment loop. It balances exploration and exploitation strategies and provides confidence scores for each hypothesis.

Key Features

  • Generates 2-3 ranked, testable hypotheses from analysis results
  • Balances exploration (new approaches) vs exploitation (refining what works)
  • Provides confidence scores (0.0-1.0) and priorities (1-3)
  • Suggests concrete model choices and hyperparameters
  • Maintains conversation context for Thought Signature continuity
  • Intelligent fallback when Gemini is unavailable

Class Definition

class HypothesisGenerator:
    """Generates ranked hypotheses for the next experiment iteration using Gemini 3.

    Key features:
    - Generates 2-3 testable hypotheses from analysis results
    - Balances exploration vs exploitation strategies
    - Provides confidence scores and priorities
    - Suggests concrete model choices and parameters
    - Maintains conversation context for Thought Signature continuity
    """

    def __init__(self, gemini_client: GeminiClient):
        """Initialize the hypothesis generator.

        Args:
            gemini_client: Shared GeminiClient instance for API calls.
        """

Constructor

gemini_client
GeminiClient
required
Shared GeminiClient instance for API calls. Sharing the same client across cognitive components preserves conversation history.

Methods

generate

Generate hypotheses for the next experiment iteration.
def generate(
    self,
    analysis: AnalysisResult,
    state: ExperimentState,
) -> HypothesisSet:
    """Generate hypotheses for the next experiment iteration.

    Args:
        analysis: Analysis of the most recent experiment.
        state: Current experiment state with history.

    Returns:
        HypothesisSet with ranked hypotheses.
    """

Parameters

analysis
AnalysisResult
required
Analysis result from the most recent experiment, containing:
  • experiment_name (str): Name of analyzed experiment
  • iteration (int): Iteration number
  • success (bool): Whether experiment succeeded
  • primary_metric (Optional[MetricComparison]): Metric comparison data
  • trend_pattern (TrendPattern): Detected performance trend
  • key_observations (list[str]): Actionable insights
  • reasoning (str): Analysis reasoning
state
ExperimentState
required
Current experiment state containing:
  • experiments (list[ExperimentResult]): Historical results
  • config (Config): Configuration including task type, constraints, max iterations
  • current_iteration (int): Current iteration number
  • iterations_without_improvement (int): Plateau counter

Returns

HypothesisSet
HypothesisSet
Set of ranked hypotheses containing:
  • iteration (int): Iteration number these hypotheses are for
  • analysis_summary (str): Brief summary of what the analysis tells us
  • hypotheses (list[Hypothesis]): 2-3 ranked hypotheses (see below)
  • exploration_vs_exploitation (str): Strategy - "explore", "exploit", or "balanced"
  • reasoning (str): Overall reasoning for these hypotheses

get_generation_count

Get the number of hypothesis generations performed.
def get_generation_count(self) -> int:
    """Get the number of hypothesis generations performed."""

Returns

count
int
Total number of hypothesis sets generated by this instance.

Data Structures

Hypothesis

A single testable hypothesis with suggested implementation.
@dataclass
class Hypothesis:
    hypothesis_id: str
    statement: str
    rationale: str
    suggested_model: Optional[str]
    suggested_params: dict
    confidence_score: float  # 0.0-1.0
    priority: int  # 1-3, where 1 is highest
Fields:
hypothesis_id
str
Unique identifier for the hypothesis (e.g., “h1”, “h2”).
statement
str
Clear, testable hypothesis statement describing what to test.
rationale
str
Explanation of why this hypothesis is worth testing based on the analysis.
suggested_model
Optional[str]
Recommended model class name (e.g., “RandomForestRegressor”).
suggested_params
dict
Recommended hyperparameters for the suggested model.
confidence_score
float
Confidence in this hypothesis (0.0-1.0):
  • 0.8-1.0: Strong evidence supports this direction
  • 0.5-0.7: Moderate evidence, reasonable next step
  • 0.2-0.4: Speculative but potentially high-value
priority
int
Priority ranking (1-3):
  • 1: Highest priority - implement this first
  • 2: Secondary option
  • 3: Exploratory/backup option

HypothesisSet

@dataclass
class HypothesisSet:
    iteration: int
    analysis_summary: str
    hypotheses: list[Hypothesis]
    exploration_vs_exploitation: str  # "explore" | "exploit" | "balanced"
    reasoning: str

Exploration vs Exploitation Strategy

The generator automatically selects a strategy based on the analysis:

Explore

Try fundamentally different approaches when:
  • Performance has plateaued for multiple iterations
  • Current iteration is > 70% of max_iterations
  • Trend pattern indicates a local optimum

Exploit

Refine promising approaches when:
  • Trend pattern shows consistent improvement
  • Early iterations (≤5) with positive results
  • Recent experiments show clear winning direction

Balanced

Mix both strategies (default) for:
  • Normal iteration flow
  • Mixed results across iterations
  • Moderate performance improvements

Confidence Scoring Guidelines

RangeInterpretationExample
0.8-1.0Strong evidence from multiple iterations”XGBoost improved RMSE by 15% in last 2 trials”
0.5-0.7Moderate evidence, reasonable step”Gradient boosting hasn’t been tried yet”
0.2-0.4Speculative, potentially high-value”Neural network might capture non-linearities”

System Prompt

The generator uses a comprehensive system prompt that guides Gemini to:
  • Generate 2-3 ranked hypotheses based on analysis
  • Make each hypothesis specific and testable in a single experiment
  • Balance exploration and exploitation
  • Reference specific metric values and patterns
  • Suggest concrete model choices and hyperparameters
  • Include at least one “safe” refinement and one “exploratory” option

Usage Examples

Basic Hypothesis Generation

from src.cognitive.gemini_client import GeminiClient
from src.cognitive.hypothesis_generator import HypothesisGenerator

# Initialize
client = GeminiClient()
generator = HypothesisGenerator(gemini_client=client)

# Generate hypotheses from analysis
hypotheses = generator.generate(
    analysis=analysis_result,
    state=experiment_state
)

print(f"Strategy: {hypotheses.exploration_vs_exploitation}")
print(f"\nHypotheses ({len(hypotheses.hypotheses)}):")
for h in hypotheses.hypotheses:
    print(f"\n[{h.priority}] {h.hypothesis_id} (confidence: {h.confidence_score:.2f})")
    print(f"  Statement: {h.statement}")
    print(f"  Model: {h.suggested_model}")
    print(f"  Params: {h.suggested_params}")

Selecting Top Hypothesis

hypotheses = generator.generate(analysis, state)

# Get highest priority hypothesis
top_hypothesis = min(hypotheses.hypotheses, key=lambda h: h.priority)

print(f"Top hypothesis: {top_hypothesis.statement}")
print(f"Rationale: {top_hypothesis.rationale}")
print(f"Suggested model: {top_hypothesis.suggested_model}")
print(f"Confidence: {top_hypothesis.confidence_score:.2%}")

# Use in experiment design
if top_hypothesis.suggested_model:
    next_spec = designer.design_experiment(
        data_profile=state.data_profile,
        previous_results=state.experiments,
        task_type=state.config.task_type.value,
        iteration=state.current_iteration + 1
    )

Strategy-Based Selection

hypotheses = generator.generate(analysis, state)

if hypotheses.exploration_vs_exploitation == "explore":
    print("Exploration mode: trying diverse approaches")
    # Pick the most speculative hypothesis
    exploratory = max(hypotheses.hypotheses, key=lambda h: h.priority)
    print(f"Trying: {exploratory.statement}")
    
elif hypotheses.exploration_vs_exploitation == "exploit":
    print("Exploitation mode: refining best approach")
    # Pick the highest confidence hypothesis
    safe_bet = max(hypotheses.hypotheses, key=lambda h: h.confidence_score)
    print(f"Refining: {safe_bet.statement}")
    
else:
    print("Balanced mode: trying top-ranked hypothesis")
    top = min(hypotheses.hypotheses, key=lambda h: h.priority)
    print(f"Next: {top.statement}")

Integration with Full Pipeline

from src.cognitive.gemini_client import GeminiClient
from src.cognitive.experiment_designer import ExperimentDesigner
from src.cognitive.results_analyzer import ResultsAnalyzer
from src.cognitive.hypothesis_generator import HypothesisGenerator

# Shared client for conversation continuity
client = GeminiClient()

# Initialize cognitive components
designer = ExperimentDesigner(client)
analyzer = ResultsAnalyzer(client)
generator = HypothesisGenerator(client)

# Experiment loop
for iteration in range(1, state.config.max_iterations + 1):
    # Design experiment
    spec = designer.design_experiment(
        data_profile=state.data_profile,
        previous_results=state.experiments,
        task_type=state.config.task_type.value,
        iteration=iteration
    )
    
    # Execute and analyze
    result = execute_experiment(spec)
    analysis = analyzer.analyze(result, state)
    
    # Generate hypotheses for next iteration
    hypotheses = generator.generate(analysis, state)
    
    # Log insights
    print(f"\n=== Iteration {iteration} ===")
    print(f"Analysis trend: {analysis.trend_pattern.value}")
    print(f"Next strategy: {hypotheses.exploration_vs_exploitation}")
    print(f"Top hypothesis: {hypotheses.hypotheses[0].statement}")
    
    # Update state
    state.add_experiment(result)
    state.add_analysis(analysis)
    state.add_hypotheses(hypotheses)
    
    # Early stopping based on strategy
    if hypotheses.exploration_vs_exploitation == "explore":
        if state.iterations_without_improvement >= 5:
            print("Stopping: Plateau detected")
            break

print(f"\nGenerated {generator.get_generation_count()} hypothesis sets")

Handling Failed Experiments

# When experiment fails, generator provides safe retry hypothesis
failed_analysis = AnalysisResult(
    experiment_name="failed_xgb",
    iteration=3,
    success=False,
    # ... other fields
)

hypotheses = generator.generate(failed_analysis, state)

# Returns a simple, robust hypothesis
print(f"Hypotheses: {len(hypotheses.hypotheses)}")
print(f"Statement: {hypotheses.hypotheses[0].statement}")
print(f"Reasoning: {hypotheses.reasoning}")
# Output: "Previous experiment failed. Falling back to a robust default."

Custom Hypothesis Filtering

hypotheses = generator.generate(analysis, state)

# Filter by confidence threshold
high_confidence = [
    h for h in hypotheses.hypotheses
    if h.confidence_score >= 0.7
]

print(f"High-confidence hypotheses: {len(high_confidence)}")
for h in high_confidence:
    print(f"  - {h.statement} ({h.confidence_score:.2f})")

# Filter by model type preference
tree_based = [
    h for h in hypotheses.hypotheses
    if h.suggested_model and 'Forest' in h.suggested_model
]

print(f"\nTree-based suggestions: {len(tree_based)}")

Tracking Hypothesis History

class HypothesisTracker:
    def __init__(self):
        self.history = []
    
    def track(self, hypotheses: HypothesisSet, chosen_idx: int = 0):
        self.history.append({
            'iteration': hypotheses.iteration,
            'strategy': hypotheses.exploration_vs_exploitation,
            'chosen': hypotheses.hypotheses[chosen_idx].statement,
            'confidence': hypotheses.hypotheses[chosen_idx].confidence_score,
        })
    
    def summary(self):
        print("Hypothesis History:")
        for entry in self.history:
            print(f"  Iter {entry['iteration']}: {entry['chosen']}")
            print(f"    Strategy: {entry['strategy']}, Confidence: {entry['confidence']:.2f}")

# Usage
tracker = HypothesisTracker()

for iteration in range(1, 6):
    # ... execute, analyze
    hypotheses = generator.generate(analysis, state)
    tracker.track(hypotheses, chosen_idx=0)

tracker.summary()

Fallback Behavior

# When Gemini fails, generator provides deterministic fallbacks
try:
    hypotheses = generator.generate(analysis, state)
    
    # Check if fallback was used
    if "fallback" in hypotheses.analysis_summary.lower():
        print("Using fallback hypotheses (Gemini unavailable)")
        print(f"Reasoning: {hypotheses.reasoning}")
        
        # Fallback rotates through model types based on iteration
        for h in hypotheses.hypotheses:
            print(f"  - {h.suggested_model}: {h.statement}")
except Exception as e:
    print(f"Hypothesis generation failed: {e}")

Integration Pattern

The HypothesisGenerator fits between ResultsAnalyzer and ExperimentDesigner in the cognitive loop:
1. ExperimentDesigner → Design experiment spec
2. CodeGenerator → Generate and execute code
3. ResultsAnalyzer → Analyze experiment results
4. HypothesisGenerator → Generate next hypotheses  ← YOU ARE HERE
5. (Back to ExperimentDesigner with new insights)
The hypotheses inform the next design iteration by:
  • Suggesting specific model types and parameters
  • Indicating whether to explore or exploit
  • Providing confidence-weighted options
  • Connecting choices to observed patterns

See Also

Build docs developers (and LLMs) love