modeling module provides functions for training and evaluating various predictive models including MRA, XGBoost, LightGBM, CatBoost, GWR, and more.
Core Classes
DataSplit
Encapsulates the splitting of data into training, test, and other subsets.Attributes
df_sales- Sales data after processingdf_universe- Universe (parcel) data after processingdf_train- Training subset of sales datadf_test- Test subset of sales dataX_train- Feature matrix for the training dataX_test- Feature matrix for the test dataX_univ- Feature matrix for the universe datay_train- Target array for trainingy_test- Target array for testing
SingleModelResults
Container for results from a single prediction model.Attributes
ds- The DataSplit object useddf_universe- Universe DataFrame with predictionsdf_test- Test DataFrame with predictionsdf_sales- Sales DataFrame with predictionsmodel_name- Model name (unique identifier)model_engine- Model engine (“xgboost”, “mra”, etc.)model- The fitted model objectpred_test- PredictionResults for the test setpred_train- PredictionResults for the training setpred_sales- PredictionResults for the sales setchd- Calculated CHD (coefficient of horizontal disparity) valueutility_test- Composite utility score for the test setutility_train- Composite utility score for the training set
PredictionResults
Container for prediction results and associated performance metrics.Attributes
dep_var- The independent variable used for predictionind_vars- List of dependent variablesy- Ground truth valuesy_pred- Predicted valuesmse- Mean squared errorrmse- Root mean squared errormape- Mean absolute percent errorr2- R-squaredadj_r2- Adjusted R-squaredratio_study- RatioStudy object
Multiple Regression Analysis (MRA)
run_mra()
Train an MRA model and return its prediction results.DataSplit object
Whether to include an intercept in the model
Whether to print verbose output
Optional pre-trained MRAModel
Prediction results from the MRA model
run_multi_mra()
Train a hierarchical Multi-MRA model and return its prediction results.DataSplit object (sales/universe/splits should already be set up)
Path to write parameters out to
Ordered list of location field names, most specific to least specific
Whether to automatically trim the variable selection to the most optimal
Whether to include an intercept column in the regression
If True, print verbose output
Minimum number of observations required to fit a local OLS model
Prediction results from the Multi-MRA model
Tree-Based Models
run_xgboost()
Train an XGBoost model and return its prediction results.DataSplit object
Path to save/load parameters
Whether to use saved parameters if available
Whether to save parameters after training
Whether to perform hyperparameter tuning
Whether to print verbose output
Optional dictionary of hyperparameters
Prediction results from the XGBoost model
run_lightgbm()
Train a LightGBM model and return its prediction results.DataSplit object
Path to save/load parameters
Whether to use saved parameters if available
Whether to save parameters after training
Whether to perform hyperparameter tuning
Whether to print verbose output
Optional dictionary of hyperparameters
Prediction results from the LightGBM model
run_catboost()
Train a CatBoost model and return its prediction results.DataSplit object
Path to save/load parameters
Whether to use saved parameters if available
Whether to save parameters after training
Whether to perform hyperparameter tuning
Whether to print verbose output
Optional dictionary of hyperparameters
Prediction results from the CatBoost model
Geographically Weighted Regression (GWR)
run_gwr()
Train a GWR model and return its prediction results.DataSplit object
Path to save/load parameters
Whether to use saved bandwidth if available
Whether to save bandwidth after training
Whether to perform bandwidth selection
Whether to print verbose output
Prediction results from the GWR model
Spatial Models
run_spatial_lag()
Train a spatial lag model and return its prediction results.DataSplit object
Whether to print verbose output
Prediction results from the spatial lag model
Baseline Models
run_average()
Train an average model (baseline) and return its prediction results.DataSplit object
Whether to print verbose output
Prediction results from the average model
run_naive_area()
Train a naive area model (baseline using simple $/sqft) and return its prediction results.DataSplit object
Whether to print verbose output
Prediction results from the naive area model
run_local_area()
Train a local area model (location-based $/sqft) and return its prediction results.DataSplit object
Field name to use for location grouping
Whether to print verbose output
Prediction results from the local area model
run_pass_through()
Generate predictions using a pass-through model (e.g., assessor values).DataSplit object
Model engine identifier (e.g., “assessor”)
Whether to print verbose output
Prediction results from the pass-through model
Utility Functions
model_utility_score()
Compute a utility score for a model based on error, median ratio, COD, and CHD.SingleModelResults object
If True, compute the score using the test set results
Computed utility score (lower is better)
simple_ols()
Perform simple OLS regression with one independent variable.DataFrame containing the data
Independent variable name
Dependent variable name
Whether to include an intercept
Dictionary containing regression results including slope, r2, and other statistics
simple_mra()
Perform multiple regression analysis with multiple independent variables.DataFrame containing the data
List of independent variable names
Dependent variable name
Dictionary containing regression results