Skip to main content
OpenAVM Kit provides pre-built Jupyter notebooks that guide you through the entire property valuation workflow. These notebooks offer an interactive, step-by-step approach to data processing, cleaning, and modeling.

Notebook Overview

The notebooks/pipeline/ directory contains the core workflow notebooks:

01-assemble

Load and merge data sources into a unified structure

02-clean

Validate sales and prepare data for modeling

03-model

Train models and generate valuations
Additional notebooks in notebooks/examples/ demonstrate specific use cases:
  • synthetic_city.ipynb: Work with simulated data
  • land_value.ipynb: Focus on land valuation techniques

Getting Started

Installing Jupyter

If you haven’t already installed Jupyter:
pip install jupyter

Launching Jupyter

Start the Jupyter notebook server:
getting_started.md:142-146
jupyter notebook
This opens a browser tab showing your project directory.
Jupyter runs a local web server. Keep the terminal window open while working with notebooks.

Opening a Notebook

2
Click through: notebookspipeline
3
Select a notebook
4
Double-click 01-assemble.ipynb to open it
5
Configure your locality
6
Edit the first code cell:
7
# The slug of the locality you are currently working on
locality = "us-nc-guilford"

# Whether to print out a lot of stuff (can help with debugging)
verbose = True
8
Run cells sequentially
9
Press Shift + Enter to execute each cell in order

Notebook 1: Assemble

The assembly notebook loads and processes raw data files.

What It Does

1
Initializes the environment
2
init_notebook(locality)
3
Sets up working directory and loads configuration.
4
Syncs with cloud (optional)
5
cloud_sync(locality, verbose=True)
6
Downloads/uploads data from configured cloud storage.
7
Loads all dataframes
8
dataframes = from_checkpoint("1-assemble-01-load_dataframes", load_dataframes,
    {
        "settings": load_settings(),
        "verbose": verbose
    }
)
9
Reads CSV, Parquet, and geospatial files defined in settings.
10
Processes into SalesUniversePair
11
sales_univ_pair = from_checkpoint("1-assemble-02-process_data", process_dataframes,
    {
        "dataframes": dataframes,
        "settings": load_settings(), 
        "verbose": verbose
    }
)
12
Merges dataframes and enriches with calculated fields.
13
Enriches with street data
14
sales_univ_pair = from_checkpoint("1-assemble-03-enrich_streets", enrich_sup_streets,
    {
        "sup": sales_univ_pair,
        "settings":load_settings(), 
        "verbose":verbose
    }
)
15
Calculates frontage, depth, and distance to roads using OpenStreetMap.
16
Street enrichment can be computationally intensive for large datasets. It may take significant time and memory.
17
Tags model groups
18
sales_univ_pair = from_checkpoint("1-assemble-04-tag_modeling_groups", tag_model_groups_sup,
    {
        "sup": sales_univ_pair, 
        "settings": load_settings(), 
        "verbose": verbose
    }
)
19
Assigns each parcel to a model group (e.g., single-family, commercial).
20
Writes output files
21
write_notebook_output_sup(
    sales_univ_pair, 
    "1-assemble", 
    parquet=True, 
    gpkg=False, 
    shp=False
)
22
Saves results to out/ directory in various formats.

Key Outputs

  • out/1-assemble-sup.pickle - Complete SalesUniversePair object
  • out/look/1-assemble-universe.parquet - Universe dataframe
  • out/look/1-assemble-sales-hydrated.parquet - Sales with characteristics
Load the parquet files in QGIS or ArcGIS to visually inspect your data on a map.

Notebook 2: Clean

The cleaning notebook validates sales and fills data gaps.

What It Does

1
Loads assembled data
2
sales_univ_pair = read_pickle("out/1-assemble-sup")
3
Fills unknown values
4
sales_univ_pair = fill_unknown_values_sup(sales_univ_pair, settings)
5
Replaces missing data with configured defaults.
6
Marks clustering IDs
7
sales_univ_pair = mark_horizontal_equity_clusters_per_model_group_sup(
    sup=sales_univ_pair,
    settings=settings,
    verbose=verbose
)
8
Groups similar properties for equity analysis.
9
Processes and validates sales
10
sales_univ_pair = process_sales(
    sup=sales_univ_pair,
    settings=settings,
    verbose=verbose
)
11
Filters by date range, validates prices, applies time adjustments.
12
Runs sales scrutiny
13
sales_univ_pair = run_sales_scrutiny(
    sup=sales_univ_pair,
    settings=settings,
    drop_cluster_outliers=False,
    drop_heuristic_outliers=True,
    verbose=verbose
)
14
Identifies and optionally removes outlier sales.
15
Writes cleaned data
16
write_notebook_output_sup(
    sales_univ_pair,
    "2-clean",
    parquet=True
)

Key Outputs

  • out/2-clean-sup.pickle - Cleaned SalesUniversePair
  • out/sales_scrutiny/ - Sales scrutiny reports
  • out/look/2-clean-sales-hydrated.parquet - Cleaned sales

Notebook 3: Model

The modeling notebook trains algorithms and generates valuations.

What It Does

1
Loads cleaned data
2
sales_univ_pair = load_cleaned_data_for_modeling(settings)
3
Splits train/test sets
4
write_canonical_splits(
    sales_univ_pair,
    load_settings(),
    verbose=verbose
)
5
Creates durable 80/20 train/test split.
6
Enriches with spatial lag
7
sales_univ_pair = from_checkpoint("3-model-00-enrich-spatial-lag", enrich_sup_spatial_lag,
    {
        "sup": sales_univ_pair,
        "settings": load_settings(),
        "verbose": verbose
    }
)
8
Calculates spatially-weighted averages of nearby sales.
9
Experiments with models
10
try_models(
    sup=sales_univ_pair,
    settings=load_settings(),
    save_params=True,
    verbose=verbose,
    run_main=True,
    run_vacant=False,
    run_hedonic=False,
    run_ensemble=True,
    do_shaps=False,
    do_plots=True
)
11
Quickly tests different algorithms and variables.
12
Identifies outliers
13
identify_outliers(
    sup=sales_univ_pair,
    settings=load_settings()
)
14
Finds sales with poor prediction accuracy.
15
Finalizes models
16
results = from_checkpoint("3-model-02-finalize-models", finalize_models,
    {
        "sup": sales_univ_pair,
        "settings": load_settings(),
        "save_params": True,
        "use_saved_params": True,
        "verbose": verbose
    }
)
17
Trains production models and saves all predictions.
18
Generates reports
19
run_and_write_ratio_study_breakdowns(load_settings())
20
Creates statistical assessment quality reports.

Key Outputs

  • out/models/{model_group}/{model_type}/{algorithm}/ - Model files and predictions
  • out/canonical_splits/ - Train/test split definitions
  • out/ratio_studies/ - Performance reports

Checkpointing System

Notebooks use checkpointing to save intermediate results:
result = from_checkpoint("checkpoint-name", function_name, params)
Benefits:
  • Skip time-consuming steps on re-runs
  • Resume work after interruptions
  • Experiment without re-processing everything
Clearing checkpoints:
if clear_checkpoints:
    delete_checkpoints("1-assemble")
Set clear_checkpoints = True to force fresh execution.

Best Practices

Always run notebook cells sequentially from top to bottom. Skipping cells or running out of order can cause errors.
Use “Kernel → Restart & Clear Output” when changing the locality variable to avoid mixing data.
Jupyter auto-saves, but manually save (Ctrl+S / Cmd+S) before running long operations.
Large datasets can exhaust memory. Close unnecessary notebooks and restart kernels periodically.
Set verbose = True to see detailed progress. Turn off for production runs.
Load output parquet files in GIS software after each notebook to verify spatial accuracy.

Customizing Notebooks

You can modify notebooks to fit your workflow:

Adding custom processing

# Add a new cell with your custom logic
def custom_enrichment(sup, settings):
    df = sup.universe.copy()
    # Your custom calculations here
    df['custom_field'] = df['field1'] * df['field2']
    sup.universe = df
    return sup

sales_univ_pair = custom_enrichment(sales_univ_pair, settings)

Creating new notebooks

Base new notebooks on the existing structure:
  1. Copy an existing notebook
  2. Rename appropriately
  3. Modify the workflow steps
  4. Update checkpoint names to avoid conflicts

Troubleshooting

Your dataset may be too large for available RAM. Try:
  • Restart kernel and clear outputs
  • Close other applications
  • Process smaller subsets
  • Use a machine with more memory
OpenAVM Kit isn’t installed or not in the kernel’s path:
pip install -e .
Then restart the kernel.
Delete corrupted checkpoints:
delete_checkpoints("problematic-checkpoint-prefix")
Verify in/settings.json exists and contains valid JSON. Use a JSON validator to check syntax.

Next Steps

Creating a Locality

Set up a new locality from scratch

Data Assembly

Deep dive into data loading and processing

Modeling Guide

Advanced modeling techniques and configuration

API Reference

Complete function reference documentation

Build docs developers (and LLMs) love