Accessing Your Track Record
You can view track records for yourself and other users:Key Performance Metrics
Your track record includes several important statistics:Average Peer Score
What it is: Your mean peer score across all scored questions How to interpret:- Positive scores: You’re beating the community prediction
- Negative scores: The community prediction is beating you
- Score near 0: You’re performing similarly to the CP
- Higher is better: More positive scores indicate better performance
Total Predictions
What it is: The total number of forecasts you’ve submitted Why it matters:- More predictions = more data points for evaluating your skill
- Quantity alone doesn’t indicate quality
- Shows your engagement level
Each update to a forecast counts as a new prediction. Active forecasters who update regularly will have higher counts.
Questions Predicted
What it is: The number of unique questions you’ve forecasted on Why it matters:- Shows breadth of forecasting activity
- Higher diversity generally improves calibration
- Indicates your coverage across topics
Questions Scored
What it is: The number of resolved questions where you received a score Why it matters:- Only scored questions contribute to your track record statistics
- This is the most important number for evaluating performance
- More scored questions = more reliable performance estimates
Track Record Visualizations
The track record page includes several charts to help you understand your performance:Score Scatter Plot
What it shows: Individual scores for each question plotted over time How to read it:- X-axis: Time when the question was resolved
- Y-axis: Your score on that question
- Points above 0: Questions where you beat the CP
- Points below 0: Questions where the CP beat you
- Patterns: Look for trends over time
What to look for
What to look for
- Improving trend: Points moving upward over time indicates you’re getting better
- Consistent performance: Points clustered around a specific value
- Outliers: Unusually high or low scores worth investigating
- Volatility: How spread out your scores are
Calibration Curve
What it shows: How well your predicted probabilities match actual outcomes How to read it:- X-axis: Your predicted probability
- Y-axis: Actual frequency of occurrence
- Diagonal line: Perfect calibration
- Your curve: Your actual calibration
- Well-calibrated
- Overconfident
- Underconfident
Your curve closely follows the diagonal line:
- When you say 70%, it happens about 70% of the time
- When you say 30%, it happens about 30% of the time
- This is the goal!
Calibration curves are most reliable when you have many resolved binary predictions. With fewer predictions, random variation can make the curve appear miscalibrated.
Score Histogram
What it shows: The distribution of your scores across all questions How to read it:- X-axis: Score bins
- Y-axis: Number of questions in each bin
- Shape: Shows the pattern of your performance
- Right-skewed: More high scores than low (good!)
- Left-skewed: More low scores than high (room for improvement)
- Normal distribution: Mix of good and bad predictions
- Bimodal: Two distinct types of performance (might indicate domain expertise)
Score Types
Metaculus uses several scoring methods. The main ones you’ll see:Peer Score
How it works: Compares your forecast to the community prediction (CP) at each point in time Calculation:- Both you and the CP are scored against the actual outcome using a proper scoring rule (usually Brier or log score)
- Your peer score = Your score - CP score
- Positive means you beat the CP; negative means the CP beat you
- Measures your added value over the crowd
- Accounts for question difficulty
- Fair comparison across different questions
Baseline Score
How it works: Compares your forecast to a baseline forecast (often 50% for binary, or a naïve prediction) Why it’s used:- Shows absolute performance, not relative
- Useful for understanding raw accuracy
- Less common than peer score
Spot Scores
How it works: Evaluates your forecast at a specific point in time (thespot_scoring_time or cp_reveal_time)
Types:
- Spot Peer: Your score vs. CP at the spot time
- Spot Baseline: Your score vs. baseline at the spot time
- Reduces incentive to update constantly
- Evaluates skill at a specific meaningful moment
- Common in tournaments and challenges
If
spot_scoring_time isn’t set, it defaults to cp_reveal_time, then actual_close_time, then scheduled_close_time.Coverage
Along with scores, you’ll see coverage metrics: What it is: The fraction of the scoring period during which you had an active forecast Values:- 1.0 (100%): You had a forecast active for the entire period
- 0.5 (50%): You forecasted for half the period
- 0.0 (0%): No forecast during the scoring period
- Some scoring methods weight by coverage
- Higher coverage = more credit for your forecasts
- Encourages early and sustained participation
Additional Statistics
Beyond forecasting performance, track records show:Authoring Stats
Questions Authored
Number of questions you’ve created that were approved
Forecasts on Authored Questions
How many predictions your questions have received
Notebooks Authored
Articles and analyses you’ve published
Comments Authored
Total comments you’ve written
Improving Your Track Record
Here’s how to improve your forecasting performance:1. Improve Calibration
1. Improve Calibration
If overconfident:
- Express more uncertainty in your forecasts
- Use wider probability distributions
- Avoid extreme probabilities (0%, 100%)
- Consider alternative scenarios
- Trust your analysis more
- Be more decisive when you have good information
- Don’t automatically regress to 50%
2. Update Forecasts
2. Update Forecasts
- Set calendar reminders to review questions
- Update when significant new information emerges
- Don’t be afraid to change your mind
- More updates generally improve scores (especially with recency weighting)
3. Focus on Your Strengths
3. Focus on Your Strengths
- Identify topics where you consistently score well
- Develop domain expertise in specific areas
- Don’t spread yourself too thin
- Quality over quantity
4. Learn from Others
4. Learn from Others
- Read comments from top forecasters
- Check the community prediction and ask why it differs from yours
- Study track records of skilled forecasters
- Ask for feedback on your reasoning
5. Use Systematic Methods
5. Use Systematic Methods
- Develop a consistent forecasting process
- Use reference classes and base rates
- Break down complex questions
- Document your reasoning
6. Avoid Common Pitfalls
6. Avoid Common Pitfalls
- Don’t anchor on initial impressions
- Beware of groupthink
- Watch for motivated reasoning
- Don’t overweight vivid scenarios
- Remember regression to the mean
Track Record Privacy
You can control the visibility of your track record:- Go to Account Settings
- Navigate to Privacy Settings
- Choose who can see your:
- Full track record
- Individual predictions
- Statistics
Tournament and Project Leaderboards
Beyond personal track records, you can appear on leaderboards:Types of Leaderboards
- Tournament Leaderboards: Specific to tournament projects
- Global Leaderboards: Site-wide performance over time periods
- Project Leaderboards: Performance within specific projects
Leaderboard Metrics
- Score
- Rank
- Coverage
- Contribution Count
- Medal
- Prize
Total score across all questions in the leaderboard’s scope
Medal Exclusions
Some users are excluded from medals and prizes:- Staff: Metaculus employees
- Project owners: Organizers of the tournament
- Disqualified: Users who violated rules
- Other: Custom exclusions
Excluded users may still appear on leaderboards with
show_anyway=True, but they don’t receive ranks or prizes.Historical Context
Your track record evolves over time:- Early predictions: May have higher variance and less reliability
- Learning curve: Most forecasters improve significantly in their first 6-12 months
- Plateau: Performance often stabilizes after gaining experience
- Specialization: Many successful forecasters focus on specific domains
Comparing Track Records
When comparing forecasters:Look at sample size
Look at sample size
More scored questions = more reliable estimate of skill
Compare forecasters with similar numbers of predictions
Consider recency
Consider recency
Recent performance may be more indicative than all-time
People improve (or decline) over time
Check specialization
Check specialization
Some forecasters excel in specific domains
Overall scores may not reflect domain expertise
Account for difficulty
Account for difficulty
Peer scores already account for question difficulty
But topic selection still matters
API Access
You can retrieve track record data programmatically:Next Steps
Making Predictions
Learn how to submit better forecasts
Question Types
Understand what you’re predicting
Aggregation Methods
Learn how the CP is calculated
Scoring API
Access score data programmatically
