Jupyter Notebooks

OpenAVM Kit provides pre-built Jupyter notebooks that guide you through the entire property valuation workflow. These notebooks offer an interactive, step-by-step approach to data processing, cleaning, and modeling.

Notebook Overview

The notebooks/pipeline/ directory contains the core workflow notebooks:

01-assemble

Load and merge data sources into a unified structure

02-clean

Validate sales and prepare data for modeling

03-model

Train models and generate valuations

Additional notebooks in notebooks/examples/ demonstrate specific use cases:

synthetic_city.ipynb: Work with simulated data
land_value.ipynb: Focus on land valuation techniques

Getting Started

Installing Jupyter

If you haven’t already installed Jupyter:

pip install jupyter

Launching Jupyter

Start the Jupyter notebook server:

getting_started.md:142-146

jupyter notebook

This opens a browser tab showing your project directory.

Jupyter runs a local web server. Keep the terminal window open while working with notebooks.

Opening a Notebook

Navigate to the notebooks directory

Click through: notebooks → pipeline

Select a notebook

Double-click 01-assemble.ipynb to open it

Configure your locality

Edit the first code cell:

# The slug of the locality you are currently working on
locality = "us-nc-guilford"

# Whether to print out a lot of stuff (can help with debugging)
verbose = True

Run cells sequentially

Press Shift + Enter to execute each cell in order

Notebook 1: Assemble

The assembly notebook loads and processes raw data files.

What It Does

Initializes the environment

init_notebook(locality)

Sets up working directory and loads configuration.

Syncs with cloud (optional)

cloud_sync(locality, verbose=True)

Downloads/uploads data from configured cloud storage.

Loads all dataframes

dataframes = from_checkpoint("1-assemble-01-load_dataframes", load_dataframes,
    {
        "settings": load_settings(),
        "verbose": verbose
    }
)

Reads CSV, Parquet, and geospatial files defined in settings.

Processes into SalesUniversePair

sales_univ_pair = from_checkpoint("1-assemble-02-process_data", process_dataframes,
    {
        "dataframes": dataframes,
        "settings": load_settings(), 
        "verbose": verbose
    }
)

Merges dataframes and enriches with calculated fields.

Enriches with street data

sales_univ_pair = from_checkpoint("1-assemble-03-enrich_streets", enrich_sup_streets,
    {
        "sup": sales_univ_pair,
        "settings":load_settings(), 
        "verbose":verbose
    }
)

Calculates frontage, depth, and distance to roads using OpenStreetMap.

Street enrichment can be computationally intensive for large datasets. It may take significant time and memory.

Tags model groups

sales_univ_pair = from_checkpoint("1-assemble-04-tag_modeling_groups", tag_model_groups_sup,
    {
        "sup": sales_univ_pair, 
        "settings": load_settings(), 
        "verbose": verbose
    }
)

Assigns each parcel to a model group (e.g., single-family, commercial).

Writes output files

write_notebook_output_sup(
    sales_univ_pair, 
    "1-assemble", 
    parquet=True, 
    gpkg=False, 
    shp=False
)

Saves results to out/ directory in various formats.

Key Outputs

out/1-assemble-sup.pickle - Complete SalesUniversePair object
out/look/1-assemble-universe.parquet - Universe dataframe
out/look/1-assemble-sales-hydrated.parquet - Sales with characteristics

Load the parquet files in QGIS or ArcGIS to visually inspect your data on a map.

Notebook 2: Clean

The cleaning notebook validates sales and fills data gaps.

What It Does

Loads assembled data

sales_univ_pair = read_pickle("out/1-assemble-sup")

Fills unknown values

sales_univ_pair = fill_unknown_values_sup(sales_univ_pair, settings)

Replaces missing data with configured defaults.

Marks clustering IDs

sales_univ_pair = mark_horizontal_equity_clusters_per_model_group_sup(
    sup=sales_univ_pair,
    settings=settings,
    verbose=verbose
)

Groups similar properties for equity analysis.

Processes and validates sales

sales_univ_pair = process_sales(
    sup=sales_univ_pair,
    settings=settings,
    verbose=verbose
)

Filters by date range, validates prices, applies time adjustments.

Runs sales scrutiny

sales_univ_pair = run_sales_scrutiny(
    sup=sales_univ_pair,
    settings=settings,
    drop_cluster_outliers=False,
    drop_heuristic_outliers=True,
    verbose=verbose
)

Identifies and optionally removes outlier sales.

Writes cleaned data

write_notebook_output_sup(
    sales_univ_pair,
    "2-clean",
    parquet=True
)

Key Outputs

out/2-clean-sup.pickle - Cleaned SalesUniversePair
out/sales_scrutiny/ - Sales scrutiny reports
out/look/2-clean-sales-hydrated.parquet - Cleaned sales

Notebook 3: Model

The modeling notebook trains algorithms and generates valuations.

What It Does

Loads cleaned data

sales_univ_pair = load_cleaned_data_for_modeling(settings)

Splits train/test sets

write_canonical_splits(
    sales_univ_pair,
    load_settings(),
    verbose=verbose
)

Creates durable 80/20 train/test split.

Enriches with spatial lag

sales_univ_pair = from_checkpoint("3-model-00-enrich-spatial-lag", enrich_sup_spatial_lag,
    {
        "sup": sales_univ_pair,
        "settings": load_settings(),
        "verbose": verbose
    }
)

Calculates spatially-weighted averages of nearby sales.

Experiments with models

try_models(
    sup=sales_univ_pair,
    settings=load_settings(),
    save_params=True,
    verbose=verbose,
    run_main=True,
    run_vacant=False,
    run_hedonic=False,
    run_ensemble=True,
    do_shaps=False,
    do_plots=True
)

Quickly tests different algorithms and variables.

Identifies outliers

identify_outliers(
    sup=sales_univ_pair,
    settings=load_settings()
)

Finds sales with poor prediction accuracy.

Finalizes models

results = from_checkpoint("3-model-02-finalize-models", finalize_models,
    {
        "sup": sales_univ_pair,
        "settings": load_settings(),
        "save_params": True,
        "use_saved_params": True,
        "verbose": verbose
    }
)

Trains production models and saves all predictions.

Generates reports

run_and_write_ratio_study_breakdowns(load_settings())

Creates statistical assessment quality reports.

Key Outputs

out/models/{model_group}/{model_type}/{algorithm}/ - Model files and predictions
out/canonical_splits/ - Train/test split definitions
out/ratio_studies/ - Performance reports

Checkpointing System

Notebooks use checkpointing to save intermediate results:

result = from_checkpoint("checkpoint-name", function_name, params)

Benefits:

Skip time-consuming steps on re-runs
Resume work after interruptions
Experiment without re-processing everything

Clearing checkpoints:

if clear_checkpoints:
    delete_checkpoints("1-assemble")

Set clear_checkpoints = True to force fresh execution.

Best Practices

Run cells in order

Always run notebook cells sequentially from top to bottom. Skipping cells or running out of order can cause errors.

Clear outputs before switching localities

Use “Kernel → Restart & Clear Output” when changing the locality variable to avoid mixing data.

Save frequently

Jupyter auto-saves, but manually save (Ctrl+S / Cmd+S) before running long operations.

Monitor memory usage

Large datasets can exhaust memory. Close unnecessary notebooks and restart kernels periodically.

Use verbose mode during development

Set verbose = True to see detailed progress. Turn off for production runs.

Examine outputs visually

Load output parquet files in GIS software after each notebook to verify spatial accuracy.

Customizing Notebooks

You can modify notebooks to fit your workflow:

Adding custom processing

# Add a new cell with your custom logic
def custom_enrichment(sup, settings):
    df = sup.universe.copy()
    # Your custom calculations here
    df['custom_field'] = df['field1'] * df['field2']
    sup.universe = df
    return sup

sales_univ_pair = custom_enrichment(sales_univ_pair, settings)

Creating new notebooks

Base new notebooks on the existing structure:

Copy an existing notebook
Rename appropriately
Modify the workflow steps
Update checkpoint names to avoid conflicts

Troubleshooting

Kernel died / memory error

Your dataset may be too large for available RAM. Try:

Restart kernel and clear outputs
Close other applications
Process smaller subsets
Use a machine with more memory

ModuleNotFoundError

OpenAVM Kit isn’t installed or not in the kernel’s path:

pip install -e .

Then restart the kernel.

Checkpoint loading errors

Delete corrupted checkpoints:

delete_checkpoints("problematic-checkpoint-prefix")

Settings not loading

Verify in/settings.json exists and contains valid JSON. Use a JSON validator to check syntax.

Next Steps

Creating a Locality

Set up a new locality from scratch

Data Assembly

Deep dive into data loading and processing

Modeling Guide

Advanced modeling techniques and configuration

API Reference

Complete function reference documentation

Get Started

Core Concepts

Guides

Configuration

Advanced Topics

Notebook Overview

01-assemble

02-clean

03-model

Getting Started

Installing Jupyter

Launching Jupyter

Opening a Notebook

Notebook 1: Assemble

What It Does

Key Outputs

Notebook 2: Clean

What It Does

Key Outputs

Notebook 3: Model

What It Does

Key Outputs

Checkpointing System

Best Practices

Customizing Notebooks

Adding custom processing

Creating new notebooks

Troubleshooting

Next Steps

Creating a Locality

Data Assembly

Modeling Guide

API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Advanced Topics

​Notebook Overview

01-assemble

02-clean

03-model

​Getting Started

​Installing Jupyter

​Launching Jupyter

​Opening a Notebook

​Notebook 1: Assemble

​What It Does

​Key Outputs

​Notebook 2: Clean

​What It Does

​Key Outputs

​Notebook 3: Model

​What It Does

​Key Outputs

​Checkpointing System

​Best Practices

​Customizing Notebooks

​Adding custom processing

​Creating new notebooks

​Troubleshooting

​Next Steps

Creating a Locality

Data Assembly

Modeling Guide

API Reference

Build docs developers (and LLMs) love

Notebook Overview

Getting Started

Installing Jupyter

Launching Jupyter

Opening a Notebook

Notebook 1: Assemble

What It Does

Key Outputs

Notebook 2: Clean

What It Does

Key Outputs

Notebook 3: Model

What It Does

Key Outputs

Checkpointing System

Best Practices

Customizing Notebooks

Adding custom processing

Creating new notebooks

Troubleshooting

Next Steps