clean_valid_sales
The SalesUniversePair containing sales and universe data.
The settings dictionary containing configuration for the cleaning process. Should include:
modeling.metadata.use_sales_from: Integer or dict specifying how far back to use sales (e.g.,{"improved": 2020, "vacant": 2018})
The updated SalesUniversePair with cleaned and validated sales data. Sales are marked with:
valid_sale: Boolean indicating if the sale is validvalid_for_ratio_study: Boolean indicating if valid for ratio studiesvalid_for_land_ratio_study: Boolean indicating if valid for land ratio studies
Validation Logic
The function applies several validation rules:- Time-based filtering: Sales older than the configured threshold are marked invalid
- Price validation: Sales with null, zero, or negative prices are marked invalid
- Vacancy consistency: Sales are validated for ratio studies based on whether their vacancy status at time of sale matches current status
- Metadata scrubbing: Invalid sales have their sale-related fields (price, date, etc.) removed
filter_invalid_sales
The SalesUniversePair containing sales and universe data.
The settings dictionary containing configuration for arms-length validation. Should include:
data.process.invalid_sales.enabled: Boolean to enable/disable filteringdata.process.invalid_sales.filter: List of filter conditions to apply
If True, prints detailed information about the validation process.
The updated SalesUniversePair with arms-length validation applied. Invalid sales are removed from the sales data.
Example
fill_unknown_values_sup
The SalesUniversePair containing sales and universe data.
The settings dictionary containing configuration for filling unknown values. Should include:
data.process.fill: Dictionary mapping fill strategies to field lists- Supported strategies:
mode,median,mean,zero,unknown,false,custom
The updated SalesUniversePair with filled unknown values.
Fill Strategies
- mode: Fill with the most common value
- median: Fill with the median value (numeric fields)
- mean: Fill with the mean value (numeric fields)
- zero: Fill with zero
- unknown: Fill with “UNKNOWN” string
- false: Fill with False (boolean fields)
- custom: Fill with a custom value specified in the field entry