Classes
BenchmarkResults
Container for benchmark results across multiple models. Attributes:df_time(pd.DataFrame): Timing information for model executiondf_stats_test(pd.DataFrame): Statistics for the holdout test setdf_stats_test_post_val(pd.DataFrame): Statistics for post-valuation-date test set onlydf_stats_full(pd.DataFrame): Statistics for the full universetest_empty(bool): Whether test set contains no recordsfull_empty(bool): Whether full set contains no recordstest_post_val_empty(bool): Whether post-valuation test set contains no records
__init__
DataFrame containing timing data for model execution
DataFrame with test set statistics
DataFrame with test set statistics (post-valuation-date only)
DataFrame with full universe statistics
print()
Return a formatted string summarizing the benchmark results.
A formatted string including timings, test set stats, and universe set stats
MultiModelResults
Container for results from multiple models along with their benchmark. Attributes:model_results(dict[str, SingleModelResults]): Dictionary mapping model names to their resultsbenchmark(BenchmarkResults): Benchmark results computed from the model resultsdf_univ_orig(pd.DataFrame): Original universe DataFramedf_sales_orig(pd.DataFrame): Original sales DataFrame
__init__
Dictionary of individual model results
Benchmark results
Universe DataFrame
Sales DataFrame
add_model()
Add a new model’s results and update the benchmark.
The model name
The results for the given model
Functions
try_variables
Experiment with variables to determine which are most useful for modeling.The SalesUniversePair containing sales and universe data
Settings dictionary
Whether to print verbose output
Whether to generate plots
Whether to generate a PDF report
get_variable_recommendations
Determine which variables are most likely to be meaningful in a model. This function examines sales and universe data, applies feature selection via correlations, elastic net regularization, R², p-values, t-values, and VIF, and produces a set of recommended variables along with a written report.The sales data
The parcel universe data
Whether to consider only vacant sales
The settings dictionary
The model group to consider
A list of variables to use for feature selection. If None, variables are pulled from modeling section
A list of tests to run. If None, all tests are run. Legal values are “corr”, “r2”, “p_value”, “t_value”, “enr”, and “vif”
Whether to perform cross-validation
If True, generates a report of the variable selection process
If True, prints correlation plots
If True, prints additional debugging information
TimingData object
A dictionary with keys:
"variables": the best variables list"report": the generated report"df_results": DataFrame with detailed results
run_models
Runs predictive models on the given SalesUniversePair. This function takes detailed instructions from the provided settings dictionary and handles all the internal details like splitting the data, training the models, and saving the results. It performs basic statistic analysis on each model, and optionally combines results into an ensemble model.Sales and universe data
The settings dictionary
Whether to save model parameters
Whether to use saved model parameters
Whether to save model results
If True, prints additional information
Whether to run main (non-vacant) models
Whether to run vacant models
Whether to run hedonic models
Whether to run ensemble models
Whether to compute SHAP values
Whether to plot scatterplots
The MultiModelResults containing all model results and benchmarks
run_one_model
Run a single model based on provided parameters and return its results.Sales DataFrame
Universe DataFrame
Whether to use only vacant sales
Model group identifier
Model’s unique identifier
Dictionary of model configuration entries
Settings dictionary
Dependent variable for training
Dependent variable for testing
List of best variables selected
List of categorical fields
Output path for saving results
Whether to save parameters
Whether to use saved parameters
Whether to save results
If True, prints additional information
Whether to use hedonic pricing
Optional list of test keys (will be read from disk if not provided)
Optional list of training keys (will be read from disk if not provided)
SingleModelResults if successful, else None