Track Record - Metaculus

Your track record shows how accurate your predictions have been over time. Understanding these metrics helps you improve your forecasting skills and demonstrates your expertise to the community.

Accessing Your Track Record

You can view track records for yourself and other users:

Navigate to Profile

Click on any username or go to your own profile at /accounts/profile/[id]

Select Track Record Tab

Click the “Track Record” tab to view forecasting performance metrics

Explore the Data

Review charts, statistics, and historical performance

Key Performance Metrics

Your track record includes several important statistics:

Average Peer Score

What it is: Your mean peer score across all scored questions How to interpret:

Positive scores: You’re beating the community prediction
Negative scores: The community prediction is beating you
Score near 0: You’re performing similarly to the CP
Higher is better: More positive scores indicate better performance

An average peer score above 0 means you’re consistently adding value beyond the community aggregate - a sign of strong forecasting skill.

Total Predictions

What it is: The total number of forecasts you’ve submitted Why it matters:

More predictions = more data points for evaluating your skill
Quantity alone doesn’t indicate quality
Shows your engagement level

Each update to a forecast counts as a new prediction. Active forecasters who update regularly will have higher counts.

Questions Predicted

What it is: The number of unique questions you’ve forecasted on Why it matters:

Shows breadth of forecasting activity
Higher diversity generally improves calibration
Indicates your coverage across topics

Questions Scored

What it is: The number of resolved questions where you received a score Why it matters:

Only scored questions contribute to your track record statistics
This is the most important number for evaluating performance
More scored questions = more reliable performance estimates

A forecast only receives a score if:

The question resolves (not annulled or ambiguous)
Your forecast was active during the scoring period
You forecasted within the valid time window (between open_time and close_time)

Track Record Visualizations

The track record page includes several charts to help you understand your performance:

Score Scatter Plot

What it shows: Individual scores for each question plotted over time How to read it:

X-axis: Time when the question was resolved
Y-axis: Your score on that question
Points above 0: Questions where you beat the CP
Points below 0: Questions where the CP beat you
Patterns: Look for trends over time

What to look for

Improving trend: Points moving upward over time indicates you’re getting better
Consistent performance: Points clustered around a specific value
Outliers: Unusually high or low scores worth investigating
Volatility: How spread out your scores are

Calibration Curve

What it shows: How well your predicted probabilities match actual outcomes How to read it:

X-axis: Your predicted probability
Y-axis: Actual frequency of occurrence
Diagonal line: Perfect calibration
Your curve: Your actual calibration

Well-calibrated
Overconfident
Underconfident

Your curve closely follows the diagonal line:

When you say 70%, it happens about 70% of the time
When you say 30%, it happens about 30% of the time
This is the goal!

Calibration curves are most reliable when you have many resolved binary predictions. With fewer predictions, random variation can make the curve appear miscalibrated.

Score Histogram

What it shows: The distribution of your scores across all questions How to read it:

X-axis: Score bins
Y-axis: Number of questions in each bin
Shape: Shows the pattern of your performance

Interpretations:

Right-skewed: More high scores than low (good!)
Left-skewed: More low scores than high (room for improvement)
Normal distribution: Mix of good and bad predictions
Bimodal: Two distinct types of performance (might indicate domain expertise)

Score Types

Metaculus uses several scoring methods. The main ones you’ll see:

Peer Score

How it works: Compares your forecast to the community prediction (CP) at each point in time Calculation:

Both you and the CP are scored against the actual outcome using a proper scoring rule (usually Brier or log score)
Your peer score = Your score - CP score
Positive means you beat the CP; negative means the CP beat you

Why it’s used:

Measures your added value over the crowd
Accounts for question difficulty
Fair comparison across different questions

Peer score is the default and most common score type on Metaculus. Focus on this for evaluating your performance.

Baseline Score

How it works: Compares your forecast to a baseline forecast (often 50% for binary, or a naïve prediction) Why it’s used:

Shows absolute performance, not relative
Useful for understanding raw accuracy
Less common than peer score

Spot Scores

How it works: Evaluates your forecast at a specific point in time (the spot_scoring_time or cp_reveal_time) Types:

Spot Peer: Your score vs. CP at the spot time
Spot Baseline: Your score vs. baseline at the spot time

Why it’s used:

Reduces incentive to update constantly
Evaluates skill at a specific meaningful moment
Common in tournaments and challenges

If spot_scoring_time isn’t set, it defaults to cp_reveal_time, then actual_close_time, then scheduled_close_time.

Coverage

Along with scores, you’ll see coverage metrics: What it is: The fraction of the scoring period during which you had an active forecast Values:

1.0 (100%): You had a forecast active for the entire period
0.5 (50%): You forecasted for half the period
0.0 (0%): No forecast during the scoring period

Why it matters:

Some scoring methods weight by coverage
Higher coverage = more credit for your forecasts
Encourages early and sustained participation

Additional Statistics

Beyond forecasting performance, track records show:

Authoring Stats

Questions Authored

Number of questions you’ve created that were approved

Forecasts on Authored Questions

How many predictions your questions have received

Notebooks Authored

Articles and analyses you’ve published

Comments Authored

Total comments you’ve written

Improving Your Track Record

Here’s how to improve your forecasting performance:

1. Improve Calibration

If overconfident:

Express more uncertainty in your forecasts
Use wider probability distributions
Avoid extreme probabilities (0%, 100%)
Consider alternative scenarios

If underconfident:

Trust your analysis more
Be more decisive when you have good information
Don’t automatically regress to 50%

2. Update Forecasts

Set calendar reminders to review questions
Update when significant new information emerges
Don’t be afraid to change your mind
More updates generally improve scores (especially with recency weighting)

3. Focus on Your Strengths

Identify topics where you consistently score well
Develop domain expertise in specific areas
Don’t spread yourself too thin
Quality over quantity

4. Learn from Others

Read comments from top forecasters
Check the community prediction and ask why it differs from yours
Study track records of skilled forecasters
Ask for feedback on your reasoning

5. Use Systematic Methods

Develop a consistent forecasting process
Use reference classes and base rates
Break down complex questions
Document your reasoning

6. Avoid Common Pitfalls

Don’t anchor on initial impressions
Beware of groupthink
Watch for motivated reasoning
Don’t overweight vivid scenarios
Remember regression to the mean

Track Record Privacy

You can control the visibility of your track record:

Go to Account Settings
Navigate to Privacy Settings
Choose who can see your:
- Full track record
- Individual predictions
- Statistics

Public track records help build credibility and trust in the forecasting community. Consider keeping yours public unless you have specific privacy concerns.

Tournament and Project Leaderboards

Beyond personal track records, you can appear on leaderboards:

Types of Leaderboards

Tournament Leaderboards: Specific to tournament projects
Global Leaderboards: Site-wide performance over time periods
Project Leaderboards: Performance within specific projects

Leaderboard Metrics

Total score across all questions in the leaderboard’s scope

Medal Exclusions

Some users are excluded from medals and prizes:

Staff: Metaculus employees
Project owners: Organizers of the tournament
Disqualified: Users who violated rules
Other: Custom exclusions

Excluded users may still appear on leaderboards with show_anyway=True, but they don’t receive ranks or prizes.

Historical Context

Your track record evolves over time:

Early predictions: May have higher variance and less reliability
Learning curve: Most forecasters improve significantly in their first 6-12 months
Plateau: Performance often stabilizes after gaining experience
Specialization: Many successful forecasters focus on specific domains

Don’t be discouraged by early poor performance. Forecasting is a skill that improves with practice, feedback, and learning.

Comparing Track Records

When comparing forecasters:

Look at sample size

More scored questions = more reliable estimate of skill Compare forecasters with similar numbers of predictions

Consider recency

Recent performance may be more indicative than all-time People improve (or decline) over time

Check specialization

Some forecasters excel in specific domains Overall scores may not reflect domain expertise

Account for difficulty

Peer scores already account for question difficulty But topic selection still matters

API Access

You can retrieve track record data programmatically:

# Get user profile with track record
profile = ServerProfileApi.getProfileById(user_id)

# Access statistics
average_score = profile.average_score
forecasts_count = profile.forecasts_count
questions_predicted = profile.questions_predicted_count
score_count = profile.score_count

# Charts
scatter_plot = profile.score_scatter_plot
calibration = profile.calibration_curve
histogram = profile.score_histogram

See the API reference for complete documentation.

Next Steps

Making Predictions

Learn how to submit better forecasts

Question Types

Understand what you’re predicting

Aggregation Methods

Learn how the CP is calculated

Scoring API

Access score data programmatically

Get Started

Core Features

User Guides

Advanced

​Accessing Your Track Record

​Key Performance Metrics

​Average Peer Score

​Total Predictions

​Questions Predicted

​Questions Scored

​Track Record Visualizations

​Score Scatter Plot

​Calibration Curve

​Score Histogram

​Score Types

​Peer Score

​Baseline Score

​Spot Scores

​Coverage

​Additional Statistics

​Authoring Stats

Questions Authored

Forecasts on Authored Questions

Notebooks Authored

Comments Authored

​Improving Your Track Record

​Track Record Privacy

​Tournament and Project Leaderboards

​Types of Leaderboards

​Leaderboard Metrics

​Medal Exclusions

​Historical Context

​Comparing Track Records

​API Access

​Next Steps

Making Predictions

Question Types

Aggregation Methods

Scoring API

Build docs developers (and LLMs) love

Accessing Your Track Record

Key Performance Metrics

Average Peer Score

Total Predictions

Questions Predicted

Questions Scored

Track Record Visualizations

Score Scatter Plot

Calibration Curve

Score Histogram

Score Types

Peer Score

Baseline Score

Spot Scores

Coverage

Additional Statistics

Authoring Stats

Improving Your Track Record

Track Record Privacy

Tournament and Project Leaderboards

Types of Leaderboards

Leaderboard Metrics

Medal Exclusions

Historical Context

Comparing Track Records

API Access

Next Steps