pipeline module contains every public function called from the notebooks in the OpenAVM Kit project. This is the primary interface for building automated valuation models.
Initialization
init_notebook()
Initialize the notebook environment for a specific locality.The locality slug (e.g., “us-nc-guilford”)
load_settings()
Load and return the settings dictionary for the locality.Path to the settings file
Optional settings object to use instead of loading from a file
If True, raises an error if the settings file cannot be loaded
If True, raises a warning if the settings file cannot be loaded
The fully resolved settings dictionary
Data Loading & Processing
load_dataframes()
Load dataframes based on the provided settings and return them in a dictionary.Settings dictionary
If True, prints detailed logs during data loading
Dictionary mapping keys to loaded DataFrames
process_dataframes()
Process dataframes according to settings and return a SalesUniversePair.Dictionary of DataFrames
Settings dictionary for data processing
If True, prints detailed logs during processing
A SalesUniversePair object containing the processed sales and universe data
tag_model_groups_sup()
Tag model groups for a SalesUniversePair based on user-specified filters.Sales and universe data
Configuration settings
If True, enables verbose output
Updated SalesUniversePair with tagged model groups
process_sales()
Process sales data within a SalesUniversePair by cleaning invalid sales and applying time adjustments.Sales and universe data
Configuration settings
Whether to write out data during processing
If True, prints verbose output during processing
Updated SalesUniversePair with processed sales data
Enrichment Functions
enrich_sup_spatial_lag()
Enrich the sales and universe DataFrames with spatial lag features.SalesUniversePair containing sales and universe DataFrames
Settings dictionary
If True, prints progress information
Enriched SalesUniversePair with spatial lag features
enrich_sup_streets()
Enrich a GeoDataFrame with street network data.The data you want to enrich
Settings dictionary
If True, prints verbose output during processing
Enriched SalesUniversePair with street-related metrics
fill_unknown_values_sup()
Fill unknown values with default values as specified in settings.The SalesUniversePair containing sales and universe data
The settings dictionary containing configuration for filling unknown values
The updated SalesUniversePair with filled unknown values
Sales Scrutiny & Clustering
run_sales_scrutiny()
Run sales scrutiny analysis for each model group within a SalesUniversePair.Sales and universe data
Configuration settings
If True, drops invalid sales identified through cluster analysis
If True, drops invalid sales identified through heuristics
If True, enables verbose logging
Updated SalesUniversePair after sales scrutiny analysis
mark_ss_ids_per_model_group_sup()
Cluster parcels for a sales scrutiny study by assigning sales scrutiny IDs.Sales and universe data
Configuration settings
If True, prints verbose output during processing
Updated SalesUniversePair with marked sales scrutiny IDs
mark_horizontal_equity_clusters_per_model_group_sup()
Cluster parcels for a horizontal equity study by assigning horizontal equity cluster IDs.Sales and universe data
Configuration settings
If True, prints verbose output
If True, enables land clustering
If True, enables improvement clustering
Updated SalesUniversePair with horizontal equity clusters marked
Modeling Functions
try_variables()
Run tests on variables to figure out which might be the most predictive.Your data
Settings dictionary
If True, prints detailed logs
If True, prints visual plots
If True, generates PDF reports
try_models()
Try out predictive models on the given SalesUniversePair. Optimized for speed and iteration.Sales and universe data
Configuration settings
Whether to save model parameters
Whether to use saved model parameters
If True, enables verbose output
Flag to run main models
Flag to run vacant models
Flag to run hedonic models
Flag to run ensemble models
Flag to run SHAP analysis
Flag to plot scatterplots
Checkpoint & Cloud Functions
from_checkpoint()
Read cached data from a checkpoint file or generate it via a function.Path to the checkpoint file
Function to run if the checkpoint is not available. Should return a DataFrame
Parameters to pass to
func when generating the dataThe resulting DataFrame, loaded from the checkpoint or generated
write_checkpoint()
Write data to a checkpoint file.Data to be checkpointed
File path for saving the checkpoint
delete_checkpoints()
Delete all checkpoints that match the given prefix.The prefix used to identify checkpoints to delete
cloud_sync()
Synchronize local files to cloud storage.The locality identifier used to form remote paths
If True, prints detailed log messages
If True, simulates the sync without performing any changes
List of file paths or patterns to ignore during sync
Data Examination
examine_sup()
Print examination details of the sales and universe data from a SalesUniversePair.Object containing ‘sales’ and ‘universe’ DataFrames
Settings dictionary
examine_df()
Print examination details of a DataFrame.The data you wish to examine
Settings dictionary
Output Functions
write_notebook_output_sup()
Write notebook output to disk.Sales and universe data
File prefix for naming output files
Whether to write to parquet format
Whether to write to gpkg format
Whether to write to ESRI shapefile format
Whether to write to CSV format
write_parquet()
Write data to a parquet file.Data to be written
File path for saving the parquet