The three-stage pipeline
Archeo-Cluster processes images through three sequential stages:Detection
Images are converted to HSV color space and color-based thresholding isolates artifact regions matching a target color (such as ceramic fragments). Morphological operations clean noise and fill gaps. OpenCV
findContours identifies object boundaries and extracts geometric features: area, perimeter, centroid, circularity, and aspect ratio. Results are saved to features.csv.Clustering
Extracted features are normalized and fed into K-Means clustering. The elbow method automatically determines the optimal number of clusters (K) by analyzing within-cluster sum of squares (WCSS) — no manual tuning required. Each artifact is assigned to a cluster based on feature similarity. Results are saved to
clustered.csv along with scatter plots and an elbow curve.Spatial analysis
The Average Nearest Neighbor (ANN) index calculates the ratio of observed vs. expected mean nearest-neighbor distances across artifact centroids. ANN < 1 indicates clustering, ANN > 1 indicates dispersion, and ANN ≈ 1 indicates random distribution. Results export as GeoJSON for use in QGIS and other GIS tools.
Who it’s for
Archeo-Cluster is built for:- Archaeologists who need to classify artifacts from excavation photographs without manual annotation
- Researchers applying computer vision and machine learning to archaeological datasets
- GIS analysts who want artifact distribution data in formats compatible with QGIS
- Developers integrating artifact analysis into larger research pipelines via the Python API
Key features
Color segmentation
HSV-based segmentation isolates artifacts by color. Configure the target color with any hex value (e.g.,
#A98876) to match ceramic fragments, stone tools, or other materials.K-Means clustering
Automatic artifact grouping using K-Means with the elbow method for optimal K selection. Generates cluster scatter plots and WCSS elbow curves.
ANN spatial analysis
Average Nearest Neighbor (ANN) index quantifies whether artifacts cluster, disperse, or distribute randomly across an excavation area.
GeoJSON export
Results export as GeoJSON for direct import into QGIS and other GIS tools for further spatial analysis and visualization.
Session management
Each analysis run is stored in a named session directory. Revisit, compare, and manage previous results without re-running the pipeline.
Python API
Every CLI command has a corresponding Python class. Use
ObjectDetector, KMeansAnalyzer, and spatial analysis functions directly in your scripts.Next steps
Installation
Install Archeo-Cluster using
uv and verify your environment is ready.Quickstart
Run your first full analysis pipeline from clone to results in minutes.
