Notebook Overview
Thenotebooks/pipeline/ directory contains the core workflow notebooks:
01-assemble
Load and merge data sources into a unified structure
02-clean
Validate sales and prepare data for modeling
03-model
Train models and generate valuations
notebooks/examples/ demonstrate specific use cases:
synthetic_city.ipynb: Work with simulated dataland_value.ipynb: Focus on land valuation techniques
Getting Started
Installing Jupyter
If you haven’t already installed Jupyter:Launching Jupyter
Start the Jupyter notebook server:getting_started.md:142-146
Jupyter runs a local web server. Keep the terminal window open while working with notebooks.
Opening a Notebook
# The slug of the locality you are currently working on
locality = "us-nc-guilford"
# Whether to print out a lot of stuff (can help with debugging)
verbose = True
Notebook 1: Assemble
The assembly notebook loads and processes raw data files.What It Does
dataframes = from_checkpoint("1-assemble-01-load_dataframes", load_dataframes,
{
"settings": load_settings(),
"verbose": verbose
}
)
sales_univ_pair = from_checkpoint("1-assemble-02-process_data", process_dataframes,
{
"dataframes": dataframes,
"settings": load_settings(),
"verbose": verbose
}
)
sales_univ_pair = from_checkpoint("1-assemble-03-enrich_streets", enrich_sup_streets,
{
"sup": sales_univ_pair,
"settings":load_settings(),
"verbose":verbose
}
)
Street enrichment can be computationally intensive for large datasets. It may take significant time and memory.
sales_univ_pair = from_checkpoint("1-assemble-04-tag_modeling_groups", tag_model_groups_sup,
{
"sup": sales_univ_pair,
"settings": load_settings(),
"verbose": verbose
}
)
Key Outputs
out/1-assemble-sup.pickle- Complete SalesUniversePair objectout/look/1-assemble-universe.parquet- Universe dataframeout/look/1-assemble-sales-hydrated.parquet- Sales with characteristics
Notebook 2: Clean
The cleaning notebook validates sales and fills data gaps.What It Does
sales_univ_pair = mark_horizontal_equity_clusters_per_model_group_sup(
sup=sales_univ_pair,
settings=settings,
verbose=verbose
)
sales_univ_pair = run_sales_scrutiny(
sup=sales_univ_pair,
settings=settings,
drop_cluster_outliers=False,
drop_heuristic_outliers=True,
verbose=verbose
)
Key Outputs
out/2-clean-sup.pickle- Cleaned SalesUniversePairout/sales_scrutiny/- Sales scrutiny reportsout/look/2-clean-sales-hydrated.parquet- Cleaned sales
Notebook 3: Model
The modeling notebook trains algorithms and generates valuations.What It Does
sales_univ_pair = from_checkpoint("3-model-00-enrich-spatial-lag", enrich_sup_spatial_lag,
{
"sup": sales_univ_pair,
"settings": load_settings(),
"verbose": verbose
}
)
try_models(
sup=sales_univ_pair,
settings=load_settings(),
save_params=True,
verbose=verbose,
run_main=True,
run_vacant=False,
run_hedonic=False,
run_ensemble=True,
do_shaps=False,
do_plots=True
)
results = from_checkpoint("3-model-02-finalize-models", finalize_models,
{
"sup": sales_univ_pair,
"settings": load_settings(),
"save_params": True,
"use_saved_params": True,
"verbose": verbose
}
)
Key Outputs
out/models/{model_group}/{model_type}/{algorithm}/- Model files and predictionsout/canonical_splits/- Train/test split definitionsout/ratio_studies/- Performance reports
Checkpointing System
Notebooks use checkpointing to save intermediate results:- Skip time-consuming steps on re-runs
- Resume work after interruptions
- Experiment without re-processing everything
clear_checkpoints = True to force fresh execution.
Best Practices
Run cells in order
Run cells in order
Always run notebook cells sequentially from top to bottom. Skipping cells or running out of order can cause errors.
Clear outputs before switching localities
Clear outputs before switching localities
Use “Kernel → Restart & Clear Output” when changing the
locality variable to avoid mixing data.Save frequently
Save frequently
Jupyter auto-saves, but manually save (Ctrl+S / Cmd+S) before running long operations.
Monitor memory usage
Monitor memory usage
Large datasets can exhaust memory. Close unnecessary notebooks and restart kernels periodically.
Use verbose mode during development
Use verbose mode during development
Set
verbose = True to see detailed progress. Turn off for production runs.Examine outputs visually
Examine outputs visually
Load output parquet files in GIS software after each notebook to verify spatial accuracy.
Customizing Notebooks
You can modify notebooks to fit your workflow:Adding custom processing
Creating new notebooks
Base new notebooks on the existing structure:- Copy an existing notebook
- Rename appropriately
- Modify the workflow steps
- Update checkpoint names to avoid conflicts
Troubleshooting
Kernel died / memory error
Kernel died / memory error
Your dataset may be too large for available RAM. Try:
- Restart kernel and clear outputs
- Close other applications
- Process smaller subsets
- Use a machine with more memory
ModuleNotFoundError
ModuleNotFoundError
OpenAVM Kit isn’t installed or not in the kernel’s path:Then restart the kernel.
Checkpoint loading errors
Checkpoint loading errors
Delete corrupted checkpoints:
Settings not loading
Settings not loading
Verify
in/settings.json exists and contains valid JSON. Use a JSON validator to check syntax.Next Steps
Creating a Locality
Set up a new locality from scratch
Data Assembly
Deep dive into data loading and processing
Modeling Guide
Advanced modeling techniques and configuration
API Reference
Complete function reference documentation