Skip to main content

Overview

Forecasting is the core activity on Metaculus. Forecasters make predictions about future events by submitting probability distributions that represent their beliefs about different outcomes.

Making Predictions

Forecasts vary by question type, but all share common characteristics:
Submit a single probability between 0% and 100% representing your confidence the event will occur.Data Structure (from questions/models.py:639-642):
probability_yes: float = models.FloatField(null=True, blank=True)
Example: 75% probability that SpaceX launches Starship to orbit in 2025
Distribute probability across available options. Probabilities must sum to 100%.Data Structure (from questions/models.py:643-648):
probability_yes_per_category: list[float | None] = ArrayField(
    models.FloatField(null=True),
    null=True,
    blank=True,
)
Options not yet available at forecast time are stored as None.
Provide a cumulative distribution function (CDF) over the range of possible values.Data Structure (from questions/models.py:630-637):
continuous_cdf: list[float] = ArrayField(
    models.FloatField(),
    null=True,
    max_length=DEFAULT_INBOUND_OUTCOME_COUNT + 1,
    blank=True,
)
The CDF is evaluated at evenly-spaced quantiles from 0.0 to 1.0 (default: 201 points).

Forecast Model

Each forecast is a snapshot of a user’s prediction at a specific time.

Core Fields

From questions/models.py:609-663:
class Forecast(models.Model):
    # Time range when this forecast is active
    start_time = models.DateTimeField(db_index=True)
    end_time = models.DateTimeField(null=True, db_index=True, blank=True)
    
    # Prediction data (one of these will be set based on question type)
    probability_yes: float           # Binary questions
    probability_yes_per_category: list[float | None]  # Multiple choice
    continuous_cdf: list[float]      # Numeric/Date/Discrete
    
    # Metadata
    author = models.ForeignKey(User, models.CASCADE)
    question = models.ForeignKey(Question, models.CASCADE)
    post = models.ForeignKey("posts.Post", models.CASCADE)
    source = models.CharField(max_length=30, choices=SourceChoices.choices)
    distribution_input = models.JSONField(null=True, blank=True)

Forecast Sources

Forecasts can originate from different sources (from questions/models.py:664-669):
  • api: Made via the API
  • ui: Made through the web interface
  • automatic: Automatically assigned (e.g., when forecasts are split)

Updating Forecasts

When you update your prediction, Metaculus creates a new forecast entry:
  1. The previous forecast’s end_time is set to the current timestamp
  2. A new forecast is created with start_time = current timestamp
  3. The new forecast has end_time = None (indicating it’s currently active)
This time-series approach allows Metaculus to track how predictions evolve and score forecasters based on their entire prediction history.

Active Forecasts

A forecast is considered “active” when (from questions/models.py:574-601):
def active(self):
    now = timezone.now()
    
    # Forecast timing conditions
    forecast_started = Q(start_time__lte=now)
    forecast_not_ended = Q(end_time__isnull=True) | Q(end_time__gt=now)
    
    # Question status conditions
    question_not_closed = Q(question__actual_close_time__isnull=True)
    question_still_accepting_forecasts = Q(question__scheduled_close_time__gt=now)
    question_opened = Q(question__open_time__lte=now)
    
    return self.filter(
        forecast_started
        & forecast_not_ended
        & question_not_closed
        & question_still_accepting_forecasts
        & question_opened
    )

Forecast Aggregation

Metaculus combines individual forecasts into a Community Prediction (CP) that represents the consensus view.

Aggregation Methods

From questions/types.py:18-22, Metaculus supports four aggregation methods:
Recent forecasts are weighted more heavily than older forecasts. This is the default method for most questions.Best for: Standard questions with extended forecasting periodsAlgorithm: Uses time-decaying weights to give recent predictions more influence

Geometric Mean Aggregation

The core aggregation algorithm uses geometric mean (from scoring/score_math.py:28-53):
def get_geometric_means(
    forecasts: Sequence[Forecast | AggregateForecast],
) -> list[AggregationEntry]:
    geometric_means = []
    timesteps: set[float] = set()
    
    # Collect all forecast start and end times
    for forecast in forecasts:
        timesteps.add(forecast.start_time.timestamp())
        if forecast.end_time:
            timesteps.add(forecast.end_time.timestamp())
    
    # Calculate geometric mean at each timestep
    for timestep in sorted(timesteps):
        prediction_values = [
            f.get_pmf()
            for f in forecasts
            if f.start_time.timestamp() <= timestep
            and (f.end_time is None or f.end_time.timestamp() > timestep)
        ]
        
        if not prediction_values:
            continue
        
        geometric_mean = gmean(prediction_values, axis=0)
        predictors = len(prediction_values)
        
        geometric_means.append(
            AggregationEntry(geometric_mean, predictors, timestep)
        )
    
    return geometric_means
Geometric mean is used instead of arithmetic mean because it better handles extreme probabilities and prevents a single forecaster from dominating the aggregate.

Aggregate Forecast Model

Aggregated forecasts are stored separately from individual forecasts (from questions/models.py:760-833):
class AggregateForecast(models.Model):
    question = models.ForeignKey(Question, models.CASCADE)
    method = models.CharField(max_length=200, choices=AggregationMethod.choices)
    
    # Time range for this aggregate
    start_time = models.DateTimeField(db_index=True)
    end_time = models.DateTimeField(null=True, db_index=True)
    
    # Aggregated prediction values
    forecast_values: list[float | None] = ArrayField(
        models.FloatField(null=True),
        max_length=DEFAULT_INBOUND_OUTCOME_COUNT + 1
    )
    
    # Metadata
    forecaster_count: int | None = models.IntegerField(null=True)
    
    # Uncertainty quantification (for continuous distributions)
    interval_lower_bounds = ArrayField(models.FloatField(), null=True)
    centers = ArrayField(models.FloatField(), null=True)
    interval_upper_bounds = ArrayField(models.FloatField(), null=True)
    means = ArrayField(models.FloatField(), null=True)
    histogram = ArrayField(models.FloatField(), null=True, size=100)

Forecaster Count

The forecaster_count field tracks how many unique forecasters contributed to the aggregate at each timestep. This is important for:
  • Determining minimum participation thresholds
  • Calculating peer scores (which require at least 2 forecasters)
  • Displaying confidence in the community prediction

Forecast Constraints

Metaculus enforces several constraints on forecasts:

Time Constraints

From questions/models.py:691-697:
constraints = [
    # end_time must be after start_time
    models.CheckConstraint(
        check=Q(end_time__isnull=True) | Q(end_time__gt=F("start_time")),
        name="end_time_after_start_time",
    ),
]

Question Period Filter

Forecasts are filtered to only count those made during the question’s active period (from questions/models.py:539-562):
def filter_within_question_period(self):
    return self.filter(
        # Has no end time or an end time after question open time
        (Q(end_time__isnull=True) | Q(end_time__gt=F("question__open_time")))
        # AND has a start time earlier than the questions close time
        & (
            (Q(question__actual_close_time__isnull=False)
             & Q(start_time__lt=F("question__actual_close_time")))
            | (Q(question__actual_close_time__isnull=True)
               & Q(start_time__lt=F("question__scheduled_close_time")))
        ),
    )

Bot Forecasts

Metaculus includes bot forecasters that provide baseline predictions.

Bot Types

From the user model:
  • is_bot: Marks automated forecasters
  • is_primary_bot: Distinguishes official Metaculus bots from third-party bots

Including Bots in Aggregates

Questions can control whether bots are included (from questions/models.py:89):
include_bots_in_aggregates = models.BooleanField(default=False)
Bot forecasts can be excluded from queries using (from questions/models.py:569-572):
def exclude_non_primary_bots(self):
    return self.filter(
        Q(author__is_bot=False) | Q(author__is_primary_bot=True),
    )

Distribution Input

The distribution_input field stores the raw input format provided by the user (from questions/models.py:681-684):
distribution_input = models.JSONField(null=True, blank=True)
This preserves the original format (e.g., parameters of a normal distribution) even though the forecast is stored as a CDF.

Best Practices

Update Regularly

Frequent updates as you gather new information lead to better scores

Calibrate Carefully

Avoid overconfidence - extreme predictions (0%, 100%) are rarely justified

Consider Base Rates

Start with historical frequencies before adjusting for specifics

Document Reasoning

Comment on your forecasts to track your thinking and help others

API Reference

Forecasts API

Explore the full Forecasts API documentation

Questions

Understand question types and structure

Scoring

Learn how forecasts are evaluated

Leaderboards

Track your performance against others

Build docs developers (and LLMs) love