Evaluating CS2 standings model accuracy with fit script

The model includes a standalone evaluation tool (main_fit.js) that measures how accurately the current ranking model predicts match outcomes. Valve uses this to validate the model before each Major cycle.

Running the evaluation

From the model/ directory, run:

node main_fit.js

The script uses the same default data file as the main model (../data/matchdata.json). No additional arguments are required.

What the evaluation script does

The script iterates backward through historical weeks, generates rankings at each checkpoint, and measures how well those rankings predicted the matches that followed.

Load the full match dataset

The script reads all matches from the data file and identifies the most recent match timestamp. It then sets an analysis window covering approximately the past four months.

Step backward through weeks

Starting from the most recent match, the script walks backward in one-week increments. At each step it re-generates the standings using only the match data available up to that point — simulating what the model would have produced at that moment in time.

Assign expected win rates

For each match in the current week, the script looks up the two teams’ rank values from the prior week’s standings and converts the point difference into an expected win probability using the Glicko formula.Matches where either team has no prior rank value are excluded from the evaluation.

Bin matches by expected win rate

All evaluated matches are placed into 5% expected-win-rate bins (0–5%, 5–10%, …, 95–100%). Bins with fewer than 20 observations are excluded from the output to avoid noise.

Measure actual win rates

Within each bin, the script measures the fraction of matches that team 1 actually won. This observed win rate is compared to the bin’s expected win rate.

Output

The script prints one line per bin to stdout:

<expected_win_rate>,<observed_win_rate>, (<wins> / <total>)

Example output:

05,0.06, (12 / 198)
10,0.11, (47 / 421)
15,0.14, (89 / 623)
...
90,0.88, (352 / 400)
95,0.93, (74 / 80)

A well-calibrated model produces a near-diagonal line when these values are plotted — expected win rates on the x-axis, observed win rates on the y-axis.

Current model performance

The modelfit.png image in the repository root shows a scatter plot of expected vs. observed win rates for the current model. Each point represents one 5% bin; the closer the points lie to the diagonal, the more accurate the model is. Key statistics for the current model:

Metric	Value
Spearman’s rho	0.98
Slope	Slightly less than 1

The Spearman’s rho of 0.98 indicates a very strong monotonic relationship between expected and observed outcomes — the model reliably distinguishes strong teams from weak ones.

A slope less than 1 means the model is conservative at the extremes: it underestimates how likely a very dominant team is to win (high end of the expected-win-rate scale) and overestimates how likely a very weak team is to win (low end). In practice, this means the model slightly compresses confidence at the tails rather than predicting blowout win rates. The Valve team considers this a reasonable trade-off for the current Major cycle.

Evaluation parameters

The fit script uses these default parameters, defined in fit.js:

Parameter	Value	Description
`rd`	75	Fixed Glicko rating deviation used for win-probability calculations
`sampleWindow`	7 days	Width of each evaluation step (one week of matches)
`analysisWindow`	~4 months	Total historical period examined

The model parameters are intentionally open for community experimentation. Try adjusting the scoring factors, age-weight decay, or event-weight multipliers in the ranking code and re-run the fit script to see whether your changes improve or hurt the rho score. Valve has stated they are open to considering alternative models that perform well and meet the transparency and game-resistance goals. See the repository readme for details.

Overview

The Ranking Model

Data & Input Format

Running the Model

Standings

Evaluating CS2 standings model accuracy with fit script

Running the evaluation

What the evaluation script does

Output

Current model performance

Evaluation parameters

Build docs developers (and LLMs) love

Overview

The Ranking Model

Data & Input Format

Running the Model

Standings

Documentation Index

​Running the evaluation

​What the evaluation script does

​Output

​Current model performance

​Evaluation parameters

Build docs developers (and LLMs) love

Running the evaluation

What the evaluation script does

Output

Current model performance

Evaluation parameters