What Are Thought Signatures?
Thought Signatures refer to Gemini 3’s ability to maintain multi-turn conversational context across a long-running session. Instead of treating each API call as isolated, the system preserves the entire conversation history so Gemini can:- Reference results from iteration 1 when designing iteration 10
- Build on previous observations and insights
- Avoid repeating failed experiments
- Develop increasingly sophisticated hypotheses over time
The term “Thought Signatures” emphasizes that Gemini is not just remembering facts — it’s building a coherent line of reasoning that evolves across the entire experiment session.
Implementation
Single Shared GeminiClient
Location:src/cognitive/gemini_client.py:52
The key to Thought Signatures is using a single GeminiClient instance across all four cognitive components:
Conversation History
TheGeminiClient maintains a growing list of all messages (src/cognitive/gemini_client.py:85):
generate() or generate_json(), the prompt and response are automatically appended to this history (src/cognitive/gemini_client.py:174):
Multi-Turn API Calls
When generating with history enabled, the client builds a message array that includes all previous turns (src/cognitive/gemini_client.py:105):Conversation Growth Over Time
Here’s how the conversation history grows during a typical 20-iteration session:| Iteration | Component Calls | Messages Added | Total Messages |
|---|---|---|---|
| 0 (Baseline) | None | 0 | 0 |
| 1 | Designer → Analyzer → Generator | 6 | 6 |
| 2 | Designer → Analyzer → Generator | 6 | 12 |
| 5 | Designer → Analyzer → Generator | 6 | 30 |
| 10 | Designer → Analyzer → Generator | 6 | 60 |
| 20 | Designer → Analyzer → Generator | 6 | 120 |
- 2 messages from ExperimentDesigner (prompt + response)
- 2 messages from ResultsAnalyzer (prompt + response)
- 2 messages from HypothesisGenerator (prompt + response)
ReportGenerator does not contribute to the conversation history during iterations — it only runs once at the end with
use_history=False to generate the final report.Visible in —verbose Mode
When you run with--verbose, you’ll see the conversation context size displayed:
How Each Component Uses History
ExperimentDesigner
Location:src/cognitive/experiment_designer.py:114
The designer receives:
- Data profile (sent in every prompt)
- Last 5 experiment results (summarized in JSON)
- User constraints + top hypothesis from previous iteration
- All previous design decisions and reasoning
- All previous analysis observations
- All previous hypotheses
ResultsAnalyzer
Location:src/cognitive/results_analyzer.py:84
The analyzer receives:
- Current experiment result
- Metric comparison (local computation)
- Last 5 experiments from history
- Observations from previous analyses
- Patterns identified in earlier iterations
- Hypotheses that were tested and their outcomes
HypothesisGenerator
Location:src/cognitive/hypothesis_generator.py:79
The generator receives:
- Current analysis result
- Last 5 experiments from history
- Current iteration number and constraints
- Remember which hypotheses have already been tested
- Build on successful hypotheses from earlier iterations
- Avoid suggesting approaches that failed previously
- Develop increasingly sophisticated hypotheses over time
Gemini 3 Configuration for Thought Signatures
Location:src/cognitive/gemini_client.py:88
Thought Signatures require specific Gemini 3 settings:
Temperature: 1.0 (Fixed)
Thinking Level: High
All three core components (Designer, Analyzer, Generator) usethinking_level="high":
thinking_level="medium" since it’s a final summary rather than iterative reasoning.
Why Thought Signatures Matter
Without Thought Signatures (Stateless API Calls)
Imagine if each component made isolated API calls:- Iteration 5: Designer suggests trying XGBoost
- Iteration 10: Designer suggests trying XGBoost again (forgot it already failed)
- Iteration 15: Analyzer notes tree models work well (but can’t reference iteration 3’s finding)
- Iteration 20: Generator suggests linear models (ignoring all evidence they underperform)
With Thought Signatures (Shared Conversation)
- Iteration 5: Designer tries XGBoost with learning_rate=0.1 → fails with overfitting
- Iteration 10: Designer tries XGBoost with regularization (remembers iteration 5)
- Iteration 15: Analyzer says “Consistent with iteration 3’s observation about tree models”
- Iteration 20: Generator says “Based on 15 iterations, linear models underperform — focus on ensemble tuning”
Comparison to Traditional AutoML
| Traditional AutoML (H2O, Auto-sklearn) | ML Experiment Autopilot (Thought Signatures) |
|---|---|
| Each model trial is independent | Each iteration builds on all previous iterations |
| No memory of why models failed | Gemini remembers failure reasons and adjusts |
| Generic “model X trained” messages | Detailed reasoning referencing past results |
| Random or grid search over hyperparams | Hypothesis-driven parameter selection |
| No narrative report generation | Coherent report synthesizing 20+ iterations |
Thought Signatures transform AutoML from random search to guided exploration with a memory of what’s been tried and why.
Practical Example: Iteration 1 vs Iteration 10
Iteration 1 Design Prompt
Iteration 10 Design Prompt
Limitations and Considerations
Context Window
Gemini 3 has a large but finite context window. For very long sessions (50+ iterations), the conversation history could exceed the limit. Currently, the system does not implement context pruning.In practice, 20 iterations × 6 messages = 120 messages, which is well within Gemini 3’s context limits. Each message is also relatively short (data profiles and results are summarized, not sent in full).
Token Cost
Every API call sends the full conversation history, which increases token usage linearly with session length. This is acceptable for “The Marathon Agent” use case but could be expensive for 100+ iteration sessions.No History for Report Generation
TheReportGenerator intentionally uses use_history=False (src/cognitive/report_generator.py:158):
Key Takeaways
- Thought Signatures = Shared Conversation History: All cognitive components use one
GeminiClientinstance - Multi-Turn Context: Each API call includes all previous prompts and responses
- Reasoning Continuity: Gemini can reference iteration 1 when designing iteration 10
- Temperature 1.0 + High Thinking: Required for quality long-term reasoning
- Visible in —verbose: Watch the context grow (“Context: 12 turns”)
- The Marathon Agent: This is why the system qualifies for the hackathon track