Skip to main content

Database file

Data/nba_stats.db
The database is created automatically when fetch_games_teamwins() runs for the first time. It contains two tables: games (raw ingested data) and team_game_stats (engineered features for ML).

games table

Stores one row per team per game. Each NBA game produces two rows — one for each team. Populated by fetch_games_teamwins().

CREATE TABLE statement

CREATE TABLE IF NOT EXISTS games (
    game_id          TEXT,
    game_date        TEXT,
    team_id          INTEGER,
    team             TEXT,
    opponent         TEXT,
    team_points      INTEGER,
    opponent_points  INTEGER,
    team_reb         INTEGER,
    opponent_reb     INTEGER,
    team_ast         INTEGER,
    opponent_ast     INTEGER,
    home             INTEGER,
    win              INTEGER,
    PRIMARY KEY (game_id, team_id)
);

Columns

game_id
TEXT
Unique NBA game identifier. Combined with team_id to form the primary key.
game_date
TEXT
Date of the game as returned by the NBA API.
team_id
INTEGER
Numeric NBA team identifier. Combined with game_id to form the primary key.
team
TEXT
Team abbreviation (e.g., "LAL", "BOS").
opponent
TEXT
Opponent team abbreviation.
team_points
INTEGER
Points scored by the team in this game.
opponent_points
INTEGER
Points scored by the opponent in this game.
team_reb
INTEGER
Total rebounds for the team.
opponent_reb
INTEGER
Total rebounds for the opponent.
team_ast
INTEGER
Total assists for the team.
opponent_ast
INTEGER
Total assists for the opponent.
home
INTEGER
1 if the team played at home; 0 if away. Derived from the MATCHUP field ("vs." = home, "@" = away).
win
INTEGER
1 if the team won; 0 if the team lost.

team_game_stats table

Stores engineered features computed from the games table. This table is the direct input to the ML training step. Populated by generate_features_teamwins(). The table is fully replaced each time that function runs.

CREATE TABLE statement

CREATE TABLE IF NOT EXISTS team_game_stats (
    game_id               TEXT PRIMARY KEY,
    team                  TEXT,
    opponent              TEXT,
    points_diff           REAL,
    elo_diff              REAL,
    home                  INTEGER,
    win                   INTEGER,
    team_points_roll      REAL,
    opponent_points_roll  REAL,
    team_reb_roll         REAL,
    opponent_reb_roll     REAL,
    team_ast_roll         REAL,
    opponent_ast_roll     REAL
);

Columns

game_id
TEXT
NBA game identifier. Primary key.
team
TEXT
Team abbreviation.
opponent
TEXT
Opponent team abbreviation.
points_diff
REAL
team_points_roll - opponent_points_roll. Positive values indicate the team is outscoring opponents on average.
elo_diff
REAL
Elo rating difference between the two teams. Currently always 0 — reserved for future use.
home
INTEGER
1 if the team played at home; 0 if away.
win
INTEGER
1 if the team won; 0 if the team lost. Target variable for the classifier.
team_points_roll
REAL
Rolling average of points scored by the team over the last N games (default N=5).
opponent_points_roll
REAL
Rolling average of points scored by the opponent over the last N games (default N=5).
team_reb_roll
REAL
Rolling average of rebounds for the team over the last N games.
opponent_reb_roll
REAL
Rolling average of rebounds for the opponent over the last N games.
team_ast_roll
REAL
Rolling average of assists for the team over the last N games.
opponent_ast_roll
REAL
Rolling average of assists for the opponent over the last N games.
elo_diff is present in the schema but is always stored as 0. It is excluded from the feature set used by the current ML model. See train_model_teamwins() for the exact feature columns used during training.

Build docs developers (and LLMs) love