Skip to main content
Archeo-Cluster processes archaeological imagery through a three-stage pipeline: color-based object detection, K-Means clustering, and spatial analysis. Each stage produces its own output files, and the whole sequence can be run in a single command.

Pipeline overview

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Images    │ -> │  Detection  │ -> │  Clustering │ -> │  Analysis   │
│             │    │  (OpenCV)   │    │  (K-Means)  │    │  (Spatial)  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                          │                  │                  │
                          v                  v                  v
                     features.csv      clustered.csv       descriptive_stats.csv
                     contours.png      elbow_method.png    ann_results.csv
All outputs are written to a session directory so nothing is overwritten between runs. See Session Management for details on where sessions are stored.

Running the full pipeline

The pipeline command chains all three stages and stores everything in one session:
# Simplest form — session name is derived from the input folder name
uv run archeo-cluster pipeline --input-dir ./dataset

# Specify a session name and a custom target color
uv run archeo-cluster pipeline \
  --input-dir ./dataset \
  --session "site_a_2025" \
  --color "#A98876"

# Open the results folder automatically when done
uv run archeo-cluster pipeline --input-dir ./dataset --open
The --open flag launches the system file manager pointing at the session directory once the pipeline finishes. It is enabled by default.

Stage-by-stage walkthrough

1

Stage 1: Object Detection

The detect command converts each image to HSV color space, applies a color mask around the target hue, runs morphological operations to clean noise, and extracts contour features.
uv run archeo-cluster detect \
  --input-dir ./dataset \
  --session "site_a_detection" \
  --color "#A98876" \
  --min-area 50 \
  --max-area 5000
What gets saved:
FileDescription
detection/features.csvOne row per detected object: image_filename, contour_index, area, perimeter, centroid_x, centroid_y, circularity, aspect_ratio, solidity, extent
detection/<image>/01_hsv.pngHSV-converted source image
detection/<image>/02_mask_initial.pngRaw color mask before morphology
detection/<image>/03_mask_closed.pngMask after MORPH_CLOSE
detection/<image>/04_mask_morph_final.pngMask after MORPH_OPEN
detection/<image>/05_raw_contours.pngAll contours before area filtering (red)
detection/<image>/06_filtered_contours_final.pngContours that passed the area filter (green)
The session status transitions from DETECTINGDETECTION_DONE on success.
2

Stage 2: K-Means Clustering

The cluster command reads features.csv, normalises the six morphological features with StandardScaler, runs the elbow method to pick the optimal K, and fits a KMeans model.
uv run archeo-cluster cluster \
  --input ./session_dir/detection/features.csv \
  --output-dir ./session_dir/clustering \
  --max-k 10
What gets saved (one sub-directory per image):
FileDescription
clustering/<image>/<image>_clustered.csvfeatures.csv with an added cluster column
clustering/<image>/elbow_method.pngInertia vs K with the selected elbow marked
clustering/<image>/silhouette_analysis.pngSilhouette score vs K (when compute_silhouette=True)
clustering/<image>/cluster_distribution.pngScatter of centroid positions coloured by cluster
clustering/<image>/morphological_scatter.pngArea vs circularity scatter coloured by cluster
clustering/<image>/clusters_visualization.pngCentroid overlays on the filtered contours image
clustering/<image>/cluster_groups.pngRotated bounding rectangles per cluster
Features used for clustering: area, perimeter, circularity, aspect_ratio, solidity, extent. Centroid coordinates are not used as clustering features — they are preserved for spatial analysis only.The session status transitions from CLUSTERINGCLUSTERING_DONE.
3

Stage 3: Spatial Analysis

The analyze command reads each _clustered.csv file and computes descriptive statistics and the Average Nearest Neighbor (ANN) index for each cluster.
uv run archeo-cluster analyze \
  --input ./session_dir/clustering/image_name/image_name_clustered.csv \
  --output-dir ./session_dir/analysis/image_name
What gets saved:
FileDescription
analysis/<image>/descriptive_stats.csvPer-cluster mean/std/min/max for area and perimeter
analysis/<image>/ann_results.csvANN R-index and interpretation (Clustered / Random / Dispersed) per cluster
analysis/<image>/ann_results.pngBar chart of R-index values with the random baseline (R=1) marked
analysis/<image>/spatial_distribution_map.pngScatter map of objects by cluster (Y-axis inverted to match image coordinates)
analysis/<image>/boxplot_area.pngBoxplots of morphological features by cluster
The session status transitions from ANALYZINGCOMPLETED.

Performance metrics

After pipeline finishes it prints a metrics table showing how long each stage took and peak memory usage:
─────────────── Performance Summary ────────────────
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Stage       ┃ Duration (s)  ┃ Peak Memory (MB) ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Detection   │ 1.243         │ 142.3            │
│ Clustering  │ 0.817         │ 98.7             │
│ Analysis    │ 0.391         │ 61.2             │
│ Total       │ 2.451         │ 142.3            │
└─────────────┴───────────────┴──────────────────┘
The same metrics are persisted in the session’s metadata.json so you can retrieve them later.

Using the Python API

You can drive the full pipeline from code instead of the CLI:
from archeo_cluster.core.detection import ObjectDetector
from archeo_cluster.core.clustering import KMeansAnalyzer
from archeo_cluster.core.spatial import run_spatial_analysis
from archeo_cluster.models import DetectionConfig, ClusteringConfig
import pandas as pd

# Stage 1 — Detection
detection_config = DetectionConfig(
    target_color="#A98876",
    min_area=50,
    max_area=5000,
)
detector = ObjectDetector(config=detection_config)
batch = detector.process_directory("./dataset", "./output/detection")

# Save features.csv
df = pd.DataFrame(batch.to_feature_rows())
df.to_csv("./output/features.csv", index=False)

# Stage 2 — Clustering
analyzer = KMeansAnalyzer(config=ClusteringConfig(max_k=10))
clustering = analyzer.process_features_csv(
    "./output/features.csv",
    "./output/clustering",
)

# Stage 3 — Spatial analysis (per image)
for result in clustering.results:
    clustered_csv = f"./output/clustering/{result.image_name}/{result.image_name}_clustered.csv"
    run_spatial_analysis(clustered_csv, f"./output/analysis/{result.image_name}")
Set save_intermediate=False on ObjectDetector if you only need features.csv and want to skip writing the step-by-step mask images to disk.

Build docs developers (and LLMs) love