train_model_teamwins() reads the team_game_stats table, fits a RandomForestClassifier, prints test accuracy, and saves the model to disk. You must run this before making any predictions.
The
team_game_stats table must exist before training. Run generate_features_teamwins() first to populate it.How to train
Ensure the feature table is ready
Run the data pipeline to fetch games and generate rolling-window features:
Model architecture
The classifier is aRandomForestClassifier from scikit-learn:
| Parameter | Value | Description |
|---|---|---|
n_estimators | 200 | Number of decision trees in the ensemble |
random_state | 0 | Seed for reproducibility |
Train/test split
The dataset is split before fitting:| Parameter | Value | Description |
|---|---|---|
test_size | 0.3 | 30 % of rows held out for evaluation |
random_state | 42 | Seed for reproducibility |
Input features
The model uses six numeric columns fromteam_game_stats:
| Feature | Type | Description |
|---|---|---|
points_diff | float | Rolling average points differential (team minus opponent) |
team_reb_roll | float | Team’s rolling average rebounds |
opponent_reb_roll | float | Opponent’s rolling average rebounds |
team_ast_roll | float | Team’s rolling average assists |
opponent_ast_roll | float | Opponent’s rolling average assists |
home | int | 1 if the team is at home, 0 if away |
Target variable
The target column iswin — a binary integer (1 = team won, 0 = team lost).
Model persistence
After fitting, the model is serialized withpickle to models/nba_model.pkl relative to prediction_ai.py:
models/ directory is created automatically if it does not exist.