Scores

Overview

The Scores endpoints provide detailed information about how forecasts are scored on Metaculus. Scores measure forecast accuracy using proper scoring rules and are the basis for leaderboards and performance tracking.

Scores are typically accessed through leaderboard endpoints and post data downloads. These endpoints provide raw scoring data for advanced analysis.

Understanding Metaculus Scoring

Score Types

Metaculus uses several scoring methods to evaluate forecasts:

peer

Score Type

Peer Score: Measures performance relative to the community aggregate prediction. Rewards forecasters who beat the crowd.Calculated as: Your log score - Community log score

baseline

Score Type

Baseline Score: Measures performance relative to a simple baseline prior (e.g., uniform distribution for continuous questions, 50% for binary).More generous than peer score, useful for beginners.

spot_peer

Score Type

Spot Peer Score: Peer score evaluated at a specific time (CP reveal time) rather than continuously weighted.Used for tournament scoring to prevent gaming through frequent updates.

spot_baseline

Score Type

Spot Baseline Score: Baseline score evaluated at CP reveal time.

relative_legacy

Score Type

Legacy Relative Score: Historical scoring method from old Metaculus. Deprecated.

Scoring Mechanics

How Scores Are Calculated

Log Score: Your forecast is scored using logarithmic scoring rules. For binary questions with outcome O and prediction p:
- Score = log₂(p) if O = Yes
- Score = log₂(1-p) if O = No
Continuous Questions: CDF is converted to PMF, then log score is calculated based on the probability mass assigned to the actual outcome.
Coverage: Your score is weighted by what fraction of scored questions you forecasted. Higher coverage = more reliable score.
Aggregation: Scores across questions are averaged with question weights applied.

Download Score Data

Score data is primarily accessed through the data download endpoints:

Post-Level Scores

curl -X GET "https://www.metaculus.com/api/posts/3530/download-data/?include_scores=true" \
  -H "Authorization: Token YOUR_TOKEN" \
  --output question_data.zip

See the Posts endpoint documentation for full details on the download-data endpoint.

Project-Level Scores

curl -X GET "https://www.metaculus.com/api/projects/144/download-data/?include_scores=true" \
  -H "Authorization: Token YOUR_TOKEN" \
  --output project_data.zip

See the Projects endpoint documentation for details.

Score Data Format

When you download score data, you receive a CSV with the following schema:

Score Data CSV Schema

Question ID

integer

The question ID this score is for

User ID

integer

The user ID who earned this score

User Username

string

The username of the scorer

Score Type

string

Type of score: peer, baseline, spot_peer, spot_baseline, relative_legacy, or manual

Score

number

The score value. Higher is better. Can be negative.

Coverage

number

The coverage value (0-1) representing what fraction of time the user had an active forecast

Accessing Scores in Aggregations

Scores for community aggregations are included in question data when using with_cp=true:

import requests

response = requests.get(
    "https://www.metaculus.com/api/posts/3530/",
    headers={"Authorization": "Token YOUR_TOKEN"},
    params={"with_cp": True}
)

post = response.json()
question = post["question"]
aggregations = question["aggregations"]

# Get recency-weighted aggregation scores
rw_scores = aggregations["recency_weighted"]["score_data"]

print("Recency Weighted Aggregation Scores:")
print(f"  Peer Score: {rw_scores.get('peer_score', 'N/A')}")
print(f"  Baseline Score: {rw_scores.get('baseline_score', 'N/A')}")
print(f"  Coverage: {rw_scores.get('coverage', 'N/A')}")

Example: Analyze User Performance

import requests
import pandas as pd
import zipfile
import io

# Download scores for a project
response = requests.get(
    "https://www.metaculus.com/api/projects/3876/download-data/",
    headers={"Authorization": "Token YOUR_TOKEN"},
    params={"include_scores": True}
)

with zipfile.ZipFile(io.BytesIO(response.content)) as zf:
    with zf.open('score_data.csv') as f:
        scores_df = pd.read_csv(f)
    with zf.open('question_data.csv') as f:
        questions_df = pd.read_csv(f)

# Filter to peer scores only
peer_scores = scores_df[scores_df['Score Type'] == 'peer']

# Calculate statistics per user
user_stats = peer_scores.groupby('User Username').agg({
    'Score': ['mean', 'sum', 'count'],
    'Coverage': 'mean'
}).round(2)

user_stats.columns = ['Avg Score', 'Total Score', 'Questions', 'Avg Coverage']
user_stats = user_stats.sort_values('Total Score', ascending=False)

print("Top 10 Forecasters by Total Peer Score:")
print(user_stats.head(10))

# Analyze by question type
question_types = questions_df.set_index('Question ID')['Question Type']
scores_with_type = peer_scores.copy()
scores_with_type['Question Type'] = scores_with_type['Question ID'].map(question_types)

type_stats = scores_with_type.groupby('Question Type')['Score'].agg(['mean', 'count'])
print("\nAverage Score by Question Type:")
print(type_stats)

Example: Coverage Analysis

import requests
import pandas as pd
import matplotlib.pyplot as plt
import zipfile
import io

response = requests.get(
    "https://www.metaculus.com/api/posts/3530/download-data/",
    headers={"Authorization": "Token YOUR_TOKEN"},
    params={"include_scores": True}
)

with zipfile.ZipFile(io.BytesIO(response.content)) as zf:
    with zf.open('score_data.csv') as f:
        scores_df = pd.read_csv(f)

# Filter to peer scores
peer_scores = scores_df[scores_df['Score Type'] == 'peer'].copy()

# Create coverage bins
peer_scores['Coverage Bin'] = pd.cut(
    peer_scores['Coverage'],
    bins=[0, 0.25, 0.5, 0.75, 1.0],
    labels=['0-25%', '25-50%', '50-75%', '75-100%']
)

# Analyze score by coverage
coverage_stats = peer_scores.groupby('Coverage Bin')['Score'].agg(['mean', 'std', 'count'])

print("Score Statistics by Coverage:")
print(coverage_stats)

# Plot
plt.figure(figsize=(10, 6))
plt.bar(range(len(coverage_stats)), coverage_stats['mean'], 
        yerr=coverage_stats['std'], capsize=5)
plt.xlabel('Coverage Bin')
plt.ylabel('Average Peer Score')
plt.title('Forecast Accuracy vs Coverage')
plt.xticks(range(len(coverage_stats)), coverage_stats.index)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('coverage_analysis.png')

Example: Historical Score Tracking

import requests
import pandas as pd
import zipfile
import io
from datetime import datetime

# Download question and forecast data
response = requests.get(
    "https://www.metaculus.com/api/posts/3530/download-data/",
    headers={"Authorization": "Token YOUR_TOKEN"},
    params={"include_scores": True}
)

with zipfile.ZipFile(io.BytesIO(response.content)) as zf:
    with zf.open('forecast_data.csv') as f:
        forecasts_df = pd.read_csv(f)
    with zf.open('score_data.csv') as f:
        scores_df = pd.read_csv(f)
    with zf.open('question_data.csv') as f:
        questions_df = pd.read_csv(f)

# Focus on a specific user
my_user_id = 12345

my_forecasts = forecasts_df[
    (forecasts_df['Forecaster ID'] == my_user_id) &
    (forecasts_df['Forecaster ID'].notna())  # Exclude aggregations
].copy()

my_scores = scores_df[
    (scores_df['User ID'] == my_user_id) &
    (scores_df['Score Type'] == 'peer')
].copy()

# Merge to get forecast timestamps
my_forecasts['Start Time'] = pd.to_datetime(my_forecasts['Start Time'])
my_forecasts['Month'] = my_forecasts['Start Time'].dt.to_period('M')

# Calculate rolling performance
monthly_questions = my_forecasts.groupby('Month')['Question ID'].nunique()
monthly_forecasts = my_forecasts.groupby('Month').size()

print(f"Your Forecasting Activity:")
print(f"  Questions forecasted: {my_forecasts['Question ID'].nunique()}")
print(f"  Total forecasts: {len(my_forecasts)}")
print(f"  Average peer score: {my_scores['Score'].mean():.2f}")
print(f"  Score std dev: {my_scores['Score'].std():.2f}")
print(f"\nMonthly Activity:")
for month, count in monthly_questions.tail(6).items():
    forecasts = monthly_forecasts[month]
    print(f"  {month}: {count} questions, {forecasts} forecasts")

Important Notes

Score Data AccessIndividual user scores are only available:

To the user themselves
To site administrators
In aggregate form (leaderboards)

You cannot download other users’ detailed score data for privacy reasons.

Score TimingScores are calculated:

When questions resolve
When leaderboards are updated (typically daily)
When explicitly recalculated by admins

There may be a delay between question resolution and score appearance.

Coverage MattersHigh coverage (forecasting many questions) makes scores more reliable and statistically meaningful. Users with low coverage may have high variance in their scores.

Getting Started

Endpoints

Examples

Overview

Understanding Metaculus Scoring

Score Types

Scoring Mechanics

Download Score Data

Post-Level Scores

Project-Level Scores

Score Data Format

Score Data CSV Schema

Accessing Scores in Aggregations

Example: Analyze User Performance

Example: Coverage Analysis

Example: Historical Score Tracking

Important Notes

Build docs developers (and LLMs) love

Getting Started

Endpoints

Examples

​Overview

​Understanding Metaculus Scoring

​Score Types

​Scoring Mechanics

​Download Score Data

​Post-Level Scores

​Project-Level Scores

​Score Data Format

​Score Data CSV Schema

​Accessing Scores in Aggregations

​Example: Analyze User Performance

​Example: Coverage Analysis

​Example: Historical Score Tracking

​Important Notes

Build docs developers (and LLMs) love

Overview

Understanding Metaculus Scoring

Score Types

Scoring Mechanics

Download Score Data

Post-Level Scores

Project-Level Scores

Score Data Format

Score Data CSV Schema

Accessing Scores in Aggregations

Example: Analyze User Performance

Example: Coverage Analysis

Example: Historical Score Tracking

Important Notes