Skip to main content
The generate_features_teamwins function reads the games table populated by fetch_games_teamwins, computes per-team rolling averages over a configurable window, and writes the results to a team_game_stats table.

Function signature

generate_features_teamwins(rolling_window: int = 5)

Parameters

ParameterTypeDefaultDescription
rolling_windowint5Number of most recent games to average over when computing rolling statistics.

Usage

from Data.generate_features import generate_features_teamwins

# Use the default 5-game rolling window
generate_features_teamwins()

# Use a shorter window for more sensitivity to recent form
generate_features_teamwins(rolling_window=3)

# Use a longer window to smooth out variance
generate_features_teamwins(rolling_window=10)
A larger rolling window smooths out game-to-game variance and captures longer-term team quality. A smaller window makes features more sensitive to recent form. Start with the default of 5 and adjust based on model validation performance.

Rolling features

For each of the following raw stats, a rolling mean is computed per team using the last rolling_window games (with min_periods=1 so early-season rows are still included):
for stat in ['team_points', 'opponent_points', 'team_reb', 'opponent_reb', 'team_ast', 'opponent_ast']:
    df[f'{stat}_roll'] = (
        df.groupby('team')[stat]
        .rolling(rolling_window, min_periods=1)
        .mean()
        .reset_index(0, drop=True)
    )
Feature columnDescription
team_points_rollRolling average of points scored by the team.
opponent_points_rollRolling average of points scored by the opponent.
team_reb_rollRolling average of team rebounds.
opponent_reb_rollRolling average of opponent rebounds.
team_ast_rollRolling average of team assists.
opponent_ast_rollRolling average of opponent assists.

Derived feature: points_diff

After computing rolling averages, points_diff is derived as:
df['points_diff'] = df['team_points_roll'] - df['opponent_points_roll']
A positive value means the team has been outscoring opponents on average over the rolling window. This is used as a summary offensive-margin feature for the model.

team_game_stats table schema

The output table is fully replaced on each run (if_exists='replace'):
CREATE TABLE IF NOT EXISTS team_game_stats (
    game_id               TEXT PRIMARY KEY,
    team                  TEXT,
    opponent              TEXT,
    points_diff           REAL,
    elo_diff              REAL,
    home                  INTEGER,
    win                   INTEGER,
    team_points_roll      REAL,
    opponent_points_roll  REAL,
    team_reb_roll         REAL,
    opponent_reb_roll     REAL,
    team_ast_roll         REAL,
    opponent_ast_roll     REAL
)
ColumnTypeDescription
game_idTEXTNBA game identifier (primary key).
teamTEXTTeam abbreviation.
opponentTEXTOpponent abbreviation.
points_diffREALteam_points_roll minus opponent_points_roll.
elo_diffREALElo rating difference (currently 0, reserved for future use).
homeINTEGER1 if the team played at home, 0 if away.
winINTEGER1 if the team won, 0 if they lost.
team_points_rollREALRolling average points scored by the team.
opponent_points_rollREALRolling average points scored by the opponent.
team_reb_rollREALRolling average team rebounds.
opponent_reb_rollREALRolling average opponent rebounds.
team_ast_rollREALRolling average team assists.
opponent_ast_rollREALRolling average opponent assists.
The team_game_stats table is fully replaced every time generate_features_teamwins runs. You do not need to manually clear it before re-running.

Pipeline order

generate_features_teamwins reads directly from the games table. Run fetch_games_teamwins first to ensure the source data is up to date:
from Data.fetch_games import fetch_games_teamwins
from Data.generate_features import generate_features_teamwins

fetch_games_teamwins(season="2025-26")
generate_features_teamwins(rolling_window=5)

Build docs developers (and LLMs) love