Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/gabriel1200/site_Data/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Make Index scripts create and maintain the master player index, collecting basic scoring statistics and calculating True Shooting Percentage (TS%). Two versions exist:
  • make_index.py - Legacy version with manual scraping
  • make_index2.py - Modern refactored version with improved error handling

Data Sources

  • Basketball Reference: Player statistics (totals and per-possession)
  • NBA API: Player IDs and current roster data
  • URLs: basketball-reference.com/leagues/NBA_{year}_totals.html and /per_poss.html

Core Functions

pull_bref_data()

Pulls player statistics from Basketball Reference.
totals
boolean
default:"False"
If True, scrapes totals data. If False, scrapes per-possession data.
Returns: pd.DataFrame with columns:
  • player, url, team, year, G, MP, FGA, FG, 3PA, 3P, FTA, FT, PTS
# From make_index2.py:105
def pull_bref_data(totals=False):
    leagues = "playoffs" if config.PLAYOFFS_MODE else "leagues"
    if totals:
        url_pattern = f"https://www.basketball-reference.com/{leagues}/NBA_{{year}}_totals.html"
    else:
        url_pattern = f"https://www.basketball-reference.com/{leagues}/NBA_{{year}}_per_poss.html"

process_player_ids()

Matches Basketball Reference IDs to NBA API IDs.
df
pd.DataFrame
required
DataFrame containing player data with URLs
master_df
pd.DataFrame
required
Master index DataFrame with existing ID mappings
Returns: DataFrame with added bref_id, nba_id, and team_id columns
# From make_index2.py:204
def process_player_ids(df, master_df):
    # Extract Basketball Reference IDs
    df['bref_id'] = df['url'].str.split('/', expand=True)[5].str.split('.', expand=True)[0]
    
    # Map IDs to dataframe
    match_dict = dict(zip(master_df['bref_id'], master_df['nba_id']))
    df['nba_id'] = df['bref_id'].map(match_dict)

calculate_true_shooting()

Calculates True Shooting Percentage using the formula: TS% = PTS / (2 * (FGA + 0.44 * FTA)) * 100
df
pd.DataFrame
required
DataFrame with PTS, FGA, and FTA columns
Returns: DataFrame with added TS% column
# From make_index2.py:269
df['TS%'] = (df['PTS'] / (2 * (df['FGA'] + 0.44 * df['FTA']))) * 100
df.replace([np.inf, -np.inf], 0, inplace=True)
df.loc[df['TS%'] > 150, 'TS%'] = 0  # Clean extreme values

Configuration

Set at the top of make_index2.py:
PLAYOFFS_MODE
boolean
default:"True"
Toggle between playoffs and regular season data
CURRENT_YEAR
integer
default:"2025"
Year to scrape (represents 2024-25 season)
CURRENT_SEASON
string
default:"2024-25"
Season format for NBA API
# From make_index2.py:15-20
class Config:
    PLAYOFFS_MODE = True
    CURRENT_YEAR = 2025
    CURRENT_SEASON = "2024-25"

Output Files

index_master.csv / index_master_ps.csv
CSV
Master player index with ID mappingsColumns: player, url, year, team, bref_id, nba_id, team_id
scoring.csv / scoring_ps.csv
CSV
Per-possession scoring statisticsColumns: Player, TS%, PTS, MP, Tm, G, year, nba_id
totals.csv / totals_ps.csv
CSV
Total scoring statistics with shooting attemptsColumns: Player, TS%, PTS, MP, Tm, G, FTA, FGA, year, nba_id
games.csv / ps_games.csv
CSV
Games played data exported to other modulesColumns: nba_id, Player, year, G

Usage Example

# Set configuration
class Config:
    PLAYOFFS_MODE = False  # Regular season
    CURRENT_YEAR = 2025
    CURRENT_SEASON = "2024-25"

config = Config()

# Run the main pipeline
if __name__ == "__main__":
    main()
Output:
Running in REGULAR SEASON mode
Fetching data from: https://www.basketball-reference.com/leagues/NBA_2025_totals.html
Successfully processed 612 players for 2025 (totals)
Found 15 players without NBA IDs
Fetching player data from NBA API...
Found 12 additional IDs from the NBA API

Key Features

  • Dynamic header mapping: Automatically detects column positions from Basketball Reference HTML
  • ID reconciliation: Matches players across Basketball Reference and NBA API
  • Playoff/regular season toggle: Single ps flag controls data source
  • Hardcoded ID fallbacks: Manual dictionary for players missing from APIs
  • TS% calculation: Industry-standard true shooting percentage formula
  • Data validation: Removes extreme TS% values (>150%) and handles inf/NaN

Build docs developers (and LLMs) love