Skip to main content
Generating a proper Cumulative Distribution Function (CDF) for continuous questions (numeric or date types) can be challenging. This guide provides complete, tested code to help you create valid CDFs.

Understanding CDFs on Metaculus

Metaculus requires continuous forecasts as a 201-point CDF - a list of 201 probability values representing the cumulative probability at evenly-spaced points across the question’s range.

Key Concepts

1

What is a CDF?

A CDF at point x represents the probability that the outcome is less than or equal to x.
  • First value (index 0): Probability that outcome is below the lower bound
  • Middle values (indices 1-200): Probabilities at evenly-spaced points within the range
  • Last value (index 200): Should always be 1.0 (or close to it)
2

Question Scaling

Questions can have:
  • Linear scaling: Points are evenly spaced in the actual scale (e.g., 0, 10, 20, 30…)
  • Logarithmic scaling: Points are evenly spaced on a log scale (useful for wide ranges like 1 to 1,000,000)
  • Open vs. closed bounds: Open bounds require probability mass outside the range
3

CDF Requirements

Your CDF must:
  1. Have exactly 201 values (or inbound_outcome_count + 1 for discrete questions)
  2. Be strictly increasing by at least 0.00005 per step (1% / 200)
  3. Not increase by more than 0.2 at any single step
  4. Respect boundary conditions (open vs. closed)

Getting Question Scaling Information

First, retrieve the question’s scaling parameters:
import requests

API_TOKEN = "your_api_token_here"
headers = {"Authorization": f"Token {API_TOKEN}"}

# Get question details
post_id = 3530
response = requests.get(
    f"https://www.metaculus.com/api/posts/{post_id}/",
    headers=headers
)

question = response.json()["question"]

# Extract scaling information
scaling = question["scaling"]
range_min = scaling["range_min"]
range_max = scaling["range_max"]
zero_point = scaling.get("zero_point")  # None for linear, value for log
open_lower = question["open_lower_bound"]
open_upper = question["open_upper_bound"]
inbound_count = question["inbound_outcome_count"]  # Usually 200

print(f"Question: {question['title']}")
print(f"Range: {range_min} to {range_max}")
print(f"Scaling: {'logarithmic' if zero_point else 'linear'}")
print(f"Lower bound: {'open' if open_lower else 'closed'}")
print(f"Upper bound: {'open' if open_upper else 'closed'}")
print(f"Points needed: {inbound_count + 1}")

Complete CDF Generation Functions

Here’s production-ready code from the Metaculus OpenAPI specification:

Converting Nominal Values to CDF Locations

This function converts a real-world value (e.g., “500 deaths” or “2025-06-15”) to the internal [0, 1] scale:
Python
import datetime
import numpy as np

def nominal_location_to_cdf_location(
    nominal_location,  # float or ISO datetime string
    question_data: dict,
) -> float:
    """
    Takes a location in nominal format (e.g. 123, "123",
    or datetime in iso format) and scales it to metaculus's
    "internal representation" range [0,1] incorporating question scaling
    """
    if question_data["type"] == "date":
        scaled_location = datetime.datetime.fromisoformat(nominal_location).timestamp()
    else:
        scaled_location = float(nominal_location)
    
    # Unscale the value to put it into the range [0,1]
    scaling = question_data["scaling"]
    range_min = scaling.get("range_min")
    range_max = scaling.get("range_max")
    zero_point = scaling.get("zero_point")
    
    if zero_point is not None:
        # logarithmically scaled question
        deriv_ratio = (range_max - zero_point) / (range_min - zero_point)
        unscaled_location = (
            np.log(
                (scaled_location - range_min) * (deriv_ratio - 1)
                + (range_max - range_min)
            )
            - np.log(range_max - range_min)
        ) / np.log(deriv_ratio)
    else:
        # linearly scaled question
        unscaled_location = (scaled_location - range_min) / (range_max - range_min)
    
    return unscaled_location

Generating CDF from Percentiles

This is the recommended approach - specify a few key percentiles and generate a full CDF:
Python
def generate_continuous_cdf(
    percentiles: dict,
    question_data: dict,
    below_lower_bound: float = None,
    above_upper_bound: float = None,
) -> list[float]:
    """
    Takes a set of percentiles and returns a corresponding cdf with 201 values

    Parameters
    ----------
    percentiles : dict[str, float | str]
        Keys must terminate in a number interpretable as a float in range (0, 100)
        optionally preceded by an underscore "_"
        Values must be a nominal value in the scale of the question, either
        interpretable as a float (for "numeric" type questions) or a datetime in
        ISO format (for "date" type questions)
        
        Example:
        percentiles = {
            "percentile_01": 25,
            "percentile_25": 500,
            "50": 650,
            "percentile_75": "700",
            "percentile_99": 990,
        }
    
    below_lower_bound : float, optional
        Amount of probability mass assigned below the lower bound
        
    above_upper_bound : float, optional
        Amount of probability mass assigned above the upper bound
    
    question_data : dict
        Question object from the API

    Returns
    -------
    list[float]
        201-point CDF ready for submission
    """
    # This will be the set of (x, y) points that are the set points of the cdf
    percentile_locations = []

    # Take the given boundary values
    if below_lower_bound is not None:
        percentile_locations.append((0.0, below_lower_bound))
    if above_upper_bound is not None:
        percentile_locations.append((1.0, 1 - above_upper_bound))

    # Generate the remaining set of points
    for percentile, nominal_location in percentiles.items():
        height = float(str(percentile).split("_")[-1]) / 100
        location = nominal_location_to_cdf_location(nominal_location, question_data)
        percentile_locations.append((location, height))

    # Sort to ensure lookup works
    percentile_locations.sort()

    # Check validity
    first_point, last_point = percentile_locations[0], percentile_locations[-1]
    if (first_point[0] > 0.0) or (last_point[0] < 1.0):
        raise ValueError("Percentiles must encompass bounds of the question")

    def get_cdf_at(location):
        # Helper function that takes a location and returns
        # the height of the cdf at that location, linearly
        # interpolating between values
        previous = percentile_locations[0]
        for i in range(1, len(percentile_locations)):
            current = percentile_locations[i]
            if previous[0] <= location <= current[0]:
                return previous[1] + (current[1] - previous[1]) * (
                    location - previous[0]
                ) / (current[0] - previous[0])
            previous = current

    # Generate that cdf
    continuous_cdf = [get_cdf_at(i / 200) for i in range(201)]
    return continuous_cdf

Standardizing the CDF

This function ensures your CDF meets all Metaculus requirements:
Python
def standardize_cdf(cdf, question_data: dict):
    """
    Takes a cdf and returns a standardized version of it

    - Assigns no mass outside of closed bounds (scales accordingly)
    - Assigns at least a minimum amount of mass outside of open bounds
    - Increasing by at least the minimum amount (0.01 / 200 = 0.00005)
    - Caps the maximum growth to 0.2

    Note: thresholds change with different `inbound_outcome_count`s
    """
    lower_open = question_data["open_lower_bound"]
    upper_open = question_data["open_upper_bound"]
    inbound_outcome_count = question_data["inbound_outcome_count"]
    default_inbound_outcome_count = 200

    cdf = np.asarray(cdf, dtype=float)
    if not cdf.size:
        return []

    # Apply lower bound & enforce boundary values
    scale_lower_to = 0 if lower_open else cdf[0]
    scale_upper_to = 1.0 if upper_open else cdf[-1]
    rescaled_inbound_mass = scale_upper_to - scale_lower_to

    def standardize(F: float, location: float) -> float:
        # `F` is the height of the cdf at `location` (in range [0, 1])
        # Rescale
        rescaled_F = (F - scale_lower_to) / rescaled_inbound_mass
        # Offset
        if lower_open and upper_open:
            return 0.988 * rescaled_F + 0.01 * location + 0.001
        elif lower_open:
            return 0.989 * rescaled_F + 0.01 * location + 0.001
        elif upper_open:
            return 0.989 * rescaled_F + 0.01 * location
        return 0.99 * rescaled_F + 0.01 * location

    for i, value in enumerate(cdf):
        cdf[i] = standardize(value, i / (len(cdf) - 1))

    # Apply upper bound - operate in PMF space
    pmf = np.diff(cdf, prepend=0, append=1)
    # Cap depends on inboundOutcomeCount (0.2 if it is the default 200)
    cap = 0.2 * (default_inbound_outcome_count / inbound_outcome_count)

    def cap_pmf(scale: float) -> np.ndarray:
        return np.concatenate([pmf[:1], np.minimum(cap, scale * pmf[1:-1]), pmf[-1:]])

    def capped_sum(scale: float) -> float:
        return float(cap_pmf(scale).sum())

    # Find the appropriate scale search space
    lo = hi = scale = 1.0
    while capped_sum(hi) < 1.0:
        hi *= 1.2
    
    # Hone in on scale value that makes capped sum 1
    for _ in range(100):
        scale = 0.5 * (lo + hi)
        s = capped_sum(scale)
        if s < 1.0:
            lo = scale
        else:
            hi = scale
        if s == 1.0 or (hi - lo) < 2e-5:
            break
    
    # Apply scale and renormalize
    pmf = cap_pmf(scale)
    pmf[1:-1] *= (cdf[-1] - cdf[0]) / pmf[1:-1].sum()
    
    # Back to CDF space
    cdf = np.cumsum(pmf)[:-1]

    # Round to minimize floating point errors
    cdf = np.round(cdf, 10)
    return cdf.tolist()

Complete Example: Linear Scale, Closed Bounds

Here’s a complete workflow for a simple case:
import requests
import numpy as np

API_TOKEN = "your_api_token_here"
headers = {"Authorization": f"Token {API_TOKEN}"}

# 1. Get the question
post_id = 3530
response = requests.get(
    f"https://www.metaculus.com/api/posts/{post_id}/",
    headers=headers
)
question = response.json()["question"]

print(f"Question: {question['title']}")
print(f"Range: {question['scaling']['range_min']} to {question['scaling']['range_max']}")
print(f"Unit: {question['unit']}")

# 2. Define your belief as percentiles
# For example: "I think there's a 50% chance the answer is below 1000,
# 25% chance below 500, etc."
my_percentiles = {
    "5": 300,      # 5th percentile at 300
    "25": 500,     # 25th percentile at 500
    "50": 1000,    # Median at 1000
    "75": 5000,    # 75th percentile at 5000
    "95": 50000,   # 95th percentile at 50000
}

# 3. Generate CDF
my_cdf = generate_continuous_cdf(
    percentiles=my_percentiles,
    question_data=question,
    below_lower_bound=0.0,  # 0% probability below range_min (closed bound)
    above_upper_bound=0.0   # 0% probability above range_max (closed bound)
)

print(f"\nGenerated CDF with {len(my_cdf)} points")
print(f"First value: {my_cdf[0]:.5f}")
print(f"Median (index 100): {my_cdf[100]:.5f}")
print(f"Last value: {my_cdf[-1]:.5f}")

# 4. Standardize to meet all requirements
standardized_cdf = standardize_cdf(my_cdf, question)

print(f"\nStandardized CDF:")
print(f"First value: {standardized_cdf[0]:.5f}")
print(f"Last value: {standardized_cdf[-1]:.5f}")

# 5. Submit the forecast
forecast_data = [
    {
        "question": question["id"],
        "continuous_cdf": standardized_cdf
    }
]

response = requests.post(
    "https://www.metaculus.com/api/questions/forecast/",
    headers=headers,
    json=forecast_data
)

if response.status_code == 201:
    print("\nForecast submitted successfully!")
else:
    print(f"\nError: {response.status_code}")
    print(response.json())

Complete Example: Open Bounds

For questions with open bounds, you must assign probability mass outside the range:
Python
import requests

API_TOKEN = "your_api_token_here"
headers = {"Authorization": f"Token {API_TOKEN}"}

# Get question with open bounds
response = requests.get(
    "https://www.metaculus.com/api/posts/12345/",
    headers=headers
)
question = response.json()["question"]

print(f"Open lower: {question['open_lower_bound']}")
print(f"Open upper: {question['open_upper_bound']}")

# Define percentiles
my_percentiles = {
    "10": 100,
    "25": 250,
    "50": 500,
    "75": 1000,
    "90": 5000,
}

# For open bounds, must specify probability outside range
my_cdf = generate_continuous_cdf(
    percentiles=my_percentiles,
    question_data=question,
    below_lower_bound=0.05,  # 5% probability below lower bound
    above_upper_bound=0.02   # 2% probability above upper bound
)

# Standardize (this ensures at least 0.1% outside open bounds)
standardized_cdf = standardize_cdf(my_cdf, question)

# Submit
forecast_data = [{
    "question": question["id"],
    "continuous_cdf": standardized_cdf
}]

response = requests.post(
    "https://www.metaculus.com/api/questions/forecast/",
    headers=headers,
    json=forecast_data
)

print("Submitted!" if response.status_code == 201 else f"Error: {response.status_code}")

Date Questions

Date questions work the same way, but use ISO format timestamps:
Python
import requests
from datetime import datetime

API_TOKEN = "your_api_token_here"
headers = {"Authorization": f"Token {API_TOKEN}"}

# Get a date question
response = requests.get(
    "https://www.metaculus.com/api/posts/5678/",
    headers=headers
)
question = response.json()["question"]

print(f"Type: {question['type']}")  # Should be 'date'
print(f"Range: {question['scaling']['range_min']} to {question['scaling']['range_max']}")
# Note: For date questions, range_min/max are Unix timestamps

# Define percentiles using ISO format dates
my_percentiles = {
    "10": "2025-01-15T00:00:00",
    "25": "2025-03-01T00:00:00",
    "50": "2025-06-15T00:00:00",
    "75": "2025-09-30T00:00:00",
    "90": "2025-12-31T00:00:00",
}

# Generate and submit (same as numeric)
my_cdf = generate_continuous_cdf(
    percentiles=my_percentiles,
    question_data=question,
    below_lower_bound=0.0,
    above_upper_bound=0.0
)

standardized_cdf = standardize_cdf(my_cdf, question)

forecast_data = [{
    "question": question["id"],
    "continuous_cdf": standardized_cdf
}]

response = requests.post(
    "https://www.metaculus.com/api/questions/forecast/",
    headers=headers,
    json=forecast_data
)

print("Submitted!" if response.status_code == 201 else f"Error: {response.status_code}")

CDF Validation Rules

Your CDF will be rejected if it violates these rules:
The CDF must increase by at least 0.00005 (0.005%) at each step.
# Check this rule
for i in range(1, len(cdf)):
    increase = cdf[i] - cdf[i-1]
    if increase < 0.00005:
        print(f"Error at index {i}: increase too small ({increase})")
No step can increase by more than 0.2 (20%).
# Check this rule
for i in range(1, len(cdf)):
    increase = cdf[i] - cdf[i-1]
    if increase > 0.2:
        print(f"Error at index {i}: increase too large ({increase})")
  • Closed lower bound: First value must be 0.0
  • Open lower bound: First value must be at least 0.001 (0.1%)
  • Closed upper bound: Last value must be 1.0
  • Open upper bound: Last value must be at most 0.999 (99.9%)
# Check boundaries
if not question["open_lower_bound"] and cdf[0] != 0.0:
    print(f"Error: Closed lower bound requires cdf[0] = 0.0, got {cdf[0]}")

if question["open_lower_bound"] and cdf[0] < 0.001:
    print(f"Error: Open lower bound requires cdf[0] >= 0.001, got {cdf[0]}")

if not question["open_upper_bound"] and cdf[-1] != 1.0:
    print(f"Error: Closed upper bound requires cdf[-1] = 1.0, got {cdf[-1]}")

if question["open_upper_bound"] and cdf[-1] > 0.999:
    print(f"Error: Open upper bound requires cdf[-1] <= 0.999, got {cdf[-1]}")
Must have exactly inbound_outcome_count + 1 points (usually 201).
expected_length = question["inbound_outcome_count"] + 1
if len(cdf) != expected_length:
    print(f"Error: Expected {expected_length} points, got {len(cdf)}")
The standardize_cdf() function automatically fixes most validation issues. Always use it before submitting!

Common Errors and Solutions

Problem: Your percentiles don’t cover the full range from 0 to 1.Solution: Either:
  • Add extreme percentiles (e.g., 1st and 99th)
  • Specify below_lower_bound and above_upper_bound parameters
# Option 1: Add extreme percentiles
percentiles = {
    "1": 10,
    "25": 100,
    "50": 500,
    "75": 2000,
    "99": 10000,
}

# Option 2: Specify boundary probabilities
my_cdf = generate_continuous_cdf(
    percentiles={"25": 100, "50": 500, "75": 2000},
    question_data=question,
    below_lower_bound=0.01,
    above_upper_bound=0.01
)
Problem: Your distribution is too concentrated (too much probability in one place).Solution: Use standardize_cdf() which adds a uniform component to ensure minimum increase rates.
# Always standardize before submitting
standardized_cdf = standardize_cdf(my_cdf, question)
Problem: Your distribution has too sharp a spike.Solution: Spread out your percentiles more evenly, or use standardize_cdf() which caps maximum step size.

Tips for Better CDFs

  1. Start with percentiles: It’s much easier to think in terms of “I believe there’s a 50% chance the answer is below X” than to manually construct 201 probability values.
  2. Use more percentiles for complex beliefs: If you have a bimodal or unusual distribution, specify more percentiles (10th, 20th, 30th, etc.).
  3. Always standardize: The standardize_cdf() function ensures your CDF will be accepted and adds a small uniform component that actually improves forecasting performance.
  4. Check your work: Print out key percentiles from your generated CDF to verify it matches your beliefs:
# Verify your CDF
for percentile in [10, 25, 50, 75, 90]:
    index = int(percentile * 2)  # Convert to 0-200 index
    print(f"{percentile}th percentile is at index {index}: {cdf[index]:.4f}")
  1. Test with closed bounds first: Start by practicing with questions that have closed bounds - they’re simpler to work with.

Next Steps

Build docs developers (and LLMs) love