data module provides core data structures and functions for loading, processing, and enriching assessment and sales data.
Core Data Structures
SalesUniversePair
A container for the sales and universe DataFrames. Many functions operate on this data structure.DataFrame containing sales data
DataFrame containing universe (parcel) data
Methods
copy() Create a copy of the SalesUniversePair object.A new SalesUniversePair object with copied DataFrames
Either “sales” or “universe”
The new DataFrame to set for the specified key
New sales DataFrame with updates
If True, allows the update to remove rows from sales. If False, preserves all original rows
new_sale_keys.
List of sale keys to filter to
Data Loading Functions
load_dataframe()
Load a single DataFrame based on configuration settings.Configuration entry for loading the dataframe
Settings dictionary
If True, prints detailed logs during data loading
List of categorical field names
List of boolean field names
List of numeric field names
The loaded DataFrame
Data Processing Functions
process_data()
Process raw dataframes according to settings and return a SalesUniversePair.Dictionary mapping keys to DataFrames
Settings dictionary
If True, prints progress information
A SalesUniversePair containing processed sales and universe data
get_hydrated_sales_from_sup()
Merge the sales and universe DataFrames to “hydrate” the sales data.SalesUniversePair containing sales and universe DataFrames
The merged (hydrated) sales DataFrame
get_sup_model_group()
Get a subset of a SalesUniversePair for a specific model group.The SalesUniversePair to filter
The model group identifier to filter by
A new SalesUniversePair containing only the specified model group
Enrichment Functions
enrich_time()
Enrich the DataFrame by converting specified time fields to datetime and deriving additional fields.Input DataFrame
Dictionary mapping field names to datetime formats
Settings dictionary
DataFrame with enriched time fields
enrich_sup_spatial_lag()
Enrich the sales and universe DataFrames with spatial lag features.SalesUniversePair containing sales and universe DataFrames
Settings dictionary
If True, prints progress information
Enriched SalesUniversePair with spatial lag features
enrich_df_streets()
Enrich a GeoDataFrame with street network data.Input GeoDataFrame containing parcels
Settings dictionary containing configuration for the enrichment
Spacing in meters for ray casting to calculate distances to streets
Maximum length of rays to shoot for distance calculations, in meters
Buffer around the street network to consider for distance calculations, in meters
If True, prints progress information
Enriched GeoDataFrame with additional columns for street-related metrics
Utility Functions
get_sale_field()
Determine the appropriate sale price field based on time adjustment settings.Settings dictionary
Optional DataFrame to check field existence
Field name to be used for sale price (either “sale_price” or “sale_price_time_adj”)
get_vacant_sales()
Filter the sales DataFrame to return only vacant (unimproved) sales.Input DataFrame
Settings dictionary
If True, return non-vacant (improved) sales
DataFrame with an added
is_vacant columnget_vacant()
Filter the DataFrame based on the ‘is_vacant’ column.Input DataFrame
Settings dictionary
If True, return non-vacant rows
DataFrame filtered by the
is_vacant flagget_train_test_keys()
Get the training and testing keys for the sales DataFrame.Input DataFrame containing sales data
Settings dictionary
Keys for training set
Keys for testing set
get_train_test_masks()
Get the training and testing masks for the sales DataFrame.Input DataFrame containing sales data
Settings dictionary
Boolean mask for training set
Boolean mask for testing set
Field Classification Functions
get_field_classifications()
Retrieve a mapping of field names to their classifications (land, improvement, or other) and types.Settings dictionary
Dictionary mapping field names to type and class
get_important_field()
Retrieve the important field name for a given field alias from settings.Settings dictionary
Identifier for the field
Optional DataFrame to check field existence
The mapped field name if found, else None
get_report_locations()
Retrieve report location fields from settings.Settings dictionary
Optional DataFrame to filter available locations
List of report location field names
Output Functions
write_parquet()
Write data to a parquet file.Data to be written
File path for saving the parquet
write_gpkg()
Write data to a GeoPackage file.Data to be written
File path for saving the GeoPackage
write_zipped_shapefile()
Write data to a zipped shapefile.Data to be written
File path for saving the zipped shapefile
write_csv()
Write data to a CSV file.Data to be written
File path for saving the CSV