Skip to main content
Your track record shows how accurate your predictions have been over time. Understanding these metrics helps you improve your forecasting skills and demonstrates your expertise to the community.

Accessing Your Track Record

You can view track records for yourself and other users:
1

Navigate to Profile

Click on any username or go to your own profile at /accounts/profile/[id]
2

Select Track Record Tab

Click the “Track Record” tab to view forecasting performance metrics
3

Explore the Data

Review charts, statistics, and historical performance

Key Performance Metrics

Your track record includes several important statistics:

Average Peer Score

What it is: Your mean peer score across all scored questions How to interpret:
  • Positive scores: You’re beating the community prediction
  • Negative scores: The community prediction is beating you
  • Score near 0: You’re performing similarly to the CP
  • Higher is better: More positive scores indicate better performance
An average peer score above 0 means you’re consistently adding value beyond the community aggregate - a sign of strong forecasting skill.

Total Predictions

What it is: The total number of forecasts you’ve submitted Why it matters:
  • More predictions = more data points for evaluating your skill
  • Quantity alone doesn’t indicate quality
  • Shows your engagement level
Each update to a forecast counts as a new prediction. Active forecasters who update regularly will have higher counts.

Questions Predicted

What it is: The number of unique questions you’ve forecasted on Why it matters:
  • Shows breadth of forecasting activity
  • Higher diversity generally improves calibration
  • Indicates your coverage across topics

Questions Scored

What it is: The number of resolved questions where you received a score Why it matters:
  • Only scored questions contribute to your track record statistics
  • This is the most important number for evaluating performance
  • More scored questions = more reliable performance estimates
A forecast only receives a score if:
  1. The question resolves (not annulled or ambiguous)
  2. Your forecast was active during the scoring period
  3. You forecasted within the valid time window (between open_time and close_time)

Track Record Visualizations

The track record page includes several charts to help you understand your performance:

Score Scatter Plot

What it shows: Individual scores for each question plotted over time How to read it:
  • X-axis: Time when the question was resolved
  • Y-axis: Your score on that question
  • Points above 0: Questions where you beat the CP
  • Points below 0: Questions where the CP beat you
  • Patterns: Look for trends over time
  • Improving trend: Points moving upward over time indicates you’re getting better
  • Consistent performance: Points clustered around a specific value
  • Outliers: Unusually high or low scores worth investigating
  • Volatility: How spread out your scores are

Calibration Curve

What it shows: How well your predicted probabilities match actual outcomes How to read it:
  • X-axis: Your predicted probability
  • Y-axis: Actual frequency of occurrence
  • Diagonal line: Perfect calibration
  • Your curve: Your actual calibration
Your curve closely follows the diagonal line:
  • When you say 70%, it happens about 70% of the time
  • When you say 30%, it happens about 30% of the time
  • This is the goal!
Calibration curves are most reliable when you have many resolved binary predictions. With fewer predictions, random variation can make the curve appear miscalibrated.

Score Histogram

What it shows: The distribution of your scores across all questions How to read it:
  • X-axis: Score bins
  • Y-axis: Number of questions in each bin
  • Shape: Shows the pattern of your performance
Interpretations:
  • Right-skewed: More high scores than low (good!)
  • Left-skewed: More low scores than high (room for improvement)
  • Normal distribution: Mix of good and bad predictions
  • Bimodal: Two distinct types of performance (might indicate domain expertise)

Score Types

Metaculus uses several scoring methods. The main ones you’ll see:

Peer Score

How it works: Compares your forecast to the community prediction (CP) at each point in time Calculation:
  • Both you and the CP are scored against the actual outcome using a proper scoring rule (usually Brier or log score)
  • Your peer score = Your score - CP score
  • Positive means you beat the CP; negative means the CP beat you
Why it’s used:
  • Measures your added value over the crowd
  • Accounts for question difficulty
  • Fair comparison across different questions
Peer score is the default and most common score type on Metaculus. Focus on this for evaluating your performance.

Baseline Score

How it works: Compares your forecast to a baseline forecast (often 50% for binary, or a naïve prediction) Why it’s used:
  • Shows absolute performance, not relative
  • Useful for understanding raw accuracy
  • Less common than peer score

Spot Scores

How it works: Evaluates your forecast at a specific point in time (the spot_scoring_time or cp_reveal_time) Types:
  • Spot Peer: Your score vs. CP at the spot time
  • Spot Baseline: Your score vs. baseline at the spot time
Why it’s used:
  • Reduces incentive to update constantly
  • Evaluates skill at a specific meaningful moment
  • Common in tournaments and challenges
If spot_scoring_time isn’t set, it defaults to cp_reveal_time, then actual_close_time, then scheduled_close_time.

Coverage

Along with scores, you’ll see coverage metrics: What it is: The fraction of the scoring period during which you had an active forecast Values:
  • 1.0 (100%): You had a forecast active for the entire period
  • 0.5 (50%): You forecasted for half the period
  • 0.0 (0%): No forecast during the scoring period
Why it matters:
  • Some scoring methods weight by coverage
  • Higher coverage = more credit for your forecasts
  • Encourages early and sustained participation

Additional Statistics

Beyond forecasting performance, track records show:

Authoring Stats

Questions Authored

Number of questions you’ve created that were approved

Forecasts on Authored Questions

How many predictions your questions have received

Notebooks Authored

Articles and analyses you’ve published

Comments Authored

Total comments you’ve written

Improving Your Track Record

Here’s how to improve your forecasting performance:
If overconfident:
  • Express more uncertainty in your forecasts
  • Use wider probability distributions
  • Avoid extreme probabilities (0%, 100%)
  • Consider alternative scenarios
If underconfident:
  • Trust your analysis more
  • Be more decisive when you have good information
  • Don’t automatically regress to 50%
  • Set calendar reminders to review questions
  • Update when significant new information emerges
  • Don’t be afraid to change your mind
  • More updates generally improve scores (especially with recency weighting)
  • Identify topics where you consistently score well
  • Develop domain expertise in specific areas
  • Don’t spread yourself too thin
  • Quality over quantity
  • Read comments from top forecasters
  • Check the community prediction and ask why it differs from yours
  • Study track records of skilled forecasters
  • Ask for feedback on your reasoning
  • Develop a consistent forecasting process
  • Use reference classes and base rates
  • Break down complex questions
  • Document your reasoning
  • Don’t anchor on initial impressions
  • Beware of groupthink
  • Watch for motivated reasoning
  • Don’t overweight vivid scenarios
  • Remember regression to the mean

Track Record Privacy

You can control the visibility of your track record:
  1. Go to Account Settings
  2. Navigate to Privacy Settings
  3. Choose who can see your:
    • Full track record
    • Individual predictions
    • Statistics
Public track records help build credibility and trust in the forecasting community. Consider keeping yours public unless you have specific privacy concerns.

Tournament and Project Leaderboards

Beyond personal track records, you can appear on leaderboards:

Types of Leaderboards

  • Tournament Leaderboards: Specific to tournament projects
  • Global Leaderboards: Site-wide performance over time periods
  • Project Leaderboards: Performance within specific projects

Leaderboard Metrics

Total score across all questions in the leaderboard’s scope

Medal Exclusions

Some users are excluded from medals and prizes:
  • Staff: Metaculus employees
  • Project owners: Organizers of the tournament
  • Disqualified: Users who violated rules
  • Other: Custom exclusions
Excluded users may still appear on leaderboards with show_anyway=True, but they don’t receive ranks or prizes.

Historical Context

Your track record evolves over time:
  • Early predictions: May have higher variance and less reliability
  • Learning curve: Most forecasters improve significantly in their first 6-12 months
  • Plateau: Performance often stabilizes after gaining experience
  • Specialization: Many successful forecasters focus on specific domains
Don’t be discouraged by early poor performance. Forecasting is a skill that improves with practice, feedback, and learning.

Comparing Track Records

When comparing forecasters:
More scored questions = more reliable estimate of skill Compare forecasters with similar numbers of predictions
Recent performance may be more indicative than all-time People improve (or decline) over time
Some forecasters excel in specific domains Overall scores may not reflect domain expertise
Peer scores already account for question difficulty But topic selection still matters

API Access

You can retrieve track record data programmatically:
# Get user profile with track record
profile = ServerProfileApi.getProfileById(user_id)

# Access statistics
average_score = profile.average_score
forecasts_count = profile.forecasts_count
questions_predicted = profile.questions_predicted_count
score_count = profile.score_count

# Charts
scatter_plot = profile.score_scatter_plot
calibration = profile.calibration_curve
histogram = profile.score_histogram
See the API reference for complete documentation.

Next Steps

Making Predictions

Learn how to submit better forecasts

Question Types

Understand what you’re predicting

Aggregation Methods

Learn how the CP is calculated

Scoring API

Access score data programmatically

Build docs developers (and LLMs) love