Output Artifacts

Overview

The pipeline generates comprehensive artifacts for reproducibility, debugging, and analysis. All outputs are organized under the configured output_dir (default: artifacts_edge or artifacts_server).

Directory Structure

output_dir/
├── reports/
│   ├── pipeline_report.json          # Complete run report
│   ├── streaming_chunks.jsonl        # Per-chunk metrics (streaming mode)
│   └── constraint_experiment_log.jsonl
├── benchmarks/
│   ├── constraint_experiment.csv     # Resource constraint sweep results
│   ├── significance_tests.csv        # Statistical significance tests
│   ├── streaming_chunks.csv          # Chunk metrics (CSV format)
│   ├── latency_vs_data_size.csv
│   ├── throughput_vs_memory.csv
│   ├── resource_vs_accuracy.csv
│   ├── latency_vs_accuracy.png       # Visualization plots
│   ├── memory_vs_accuracy.png
│   └── latency_memory_accuracy.png
├── profiles/
│   └── operator_profile.csv          # Per-operator timing breakdown
├── metadata/
│   └── run_manifest.json             # Reproducibility manifest
└── intermediate/                     # Spilled chunks (if enabled)
    ├── stream_chunk_1_X.csv
    ├── stream_chunk_1_y.csv
    └── ...

Core Reports

pipeline_report.json

Complete execution summary containing all metrics, benchmarks, and quality reports. Location: reports/pipeline_report.json Key sections:

{
  "dataset_fingerprint": {
    "sha256": "abc123...",
    "row_count": 1340,
    "column_count": 20,
    "timestamp": "2026-03-04T10:30:00"
  },
  "reproducibility": {
    "random_seed": 42,
    "python_version": "3.10.0",
    "platform": "Linux-5.15.0",
    "config": { /* full configuration */ },
    "dependencies": {
      "numpy": "1.24.0",
      "pandas": "2.0.0",
      "matplotlib": "3.7.0"
    }
  },
  "batch": { /* batch mode results */ },
  "streaming": { /* streaming mode results */ },
  "benchmark": { /* statistical benchmarks */ },
  "constraint_experiment": { /* constraint sweep */ },
  "quality": { /* data quality metrics */ },
  "scaling": {
    "n_jobs": 4,
    "parallel_enabled": true
  }
}

Use cases:

Verify reproducibility via dataset_fingerprint.sha256
Compare runs across different configurations
Extract model metrics for reporting
Debug pipeline failures

streaming_chunks.jsonl

Newline-delimited JSON with detailed metrics for each streaming chunk. Location: reports/streaming_chunks.jsonl Example record:

{
  "chunk_id": 1,
  "rows": 64,
  "latency_s": 0.042,
  "throughput_rows_s": 1523.8,
  "batch_size": 96,
  "chunk_size": 64,
  "memory_before_mb": 45.2,
  "memory_after_mb": 52.8,
  "memory_exceeded": false,
  "retries": 0,
  "spill_paths": {
    "X": "intermediate/stream_chunk_1_X.csv",
    "y": "intermediate/stream_chunk_1_y.csv"
  },
  "operator_profile_s": {
    "preprocess_s": 0.008,
    "feature_engineering_s": 0.021,
    "feature_selection_s": 0.006,
    "encode_scale_s": 0.007
  },
  "input_bytes": 8192,
  "estimated_input_bandwidth_mb_s": 185.6
}

Use cases:

Identify memory bottlenecks (look for memory_exceeded: true)
Analyze adaptive resizing behavior (check retries)
Profile per-chunk performance
Validate spill-to-disk strategy

run_manifest.json

Reproducibility manifest for validating identical runs. Location: metadata/run_manifest.json Contents: Same as reproducibility section in pipeline_report.json Use case: Quickly verify configuration without parsing full report

# Compare two runs
diff artifacts_1/metadata/run_manifest.json \
     artifacts_2/metadata/run_manifest.json

Benchmark Artifacts

constraint_experiment.csv

Results from sweeping resource constraints (chunk size, memory, compute). Location: benchmarks/constraint_experiment.csv Columns:

Column	Type	Description
`chunk_size`	int	Rows per chunk
`memory_limit_mb`	int	Maximum memory constraint
`compute_limit`	float	CPU utilization cap (0.0-1.0)
`preprocessing_latency_s`	float	Total preprocessing time
`peak_memory_mb`	float	Peak memory usage
`training_time_s`	float	Model training time
`model_accuracy_r2`	float	Model R² score
`model_rmse`	float	Model RMSE

Generated by: engine.py:389-413 (run_constraint_experiment) Example analysis:

import pandas as pd

df = pd.read_csv('benchmarks/constraint_experiment.csv')

# Find best accuracy configuration
best = df.loc[df['model_accuracy_r2'].idxmax()]
print(f"Best R²: {best['model_accuracy_r2']:.3f}")
print(f"Configuration: chunk={best['chunk_size']}, memory={best['memory_limit_mb']}MB")

# Find lowest latency configuration
fastest = df.loc[df['preprocessing_latency_s'].idxmin()]
print(f"Fastest: {fastest['preprocessing_latency_s']:.3f}s")

significance_tests.csv

Statistical significance tests comparing batch vs. streaming modes. Location: benchmarks/significance_tests.csv Columns:

Column	Type	Description
`latency_pvalue`	float	Permutation test p-value for latency
`throughput_pvalue`	float	Permutation test p-value for throughput
`latency_mean_delta_s`	float	Mean latency difference (streaming - batch)
`throughput_mean_delta_rows_s`	float	Mean throughput difference

Interpretation:

p-value < 0.05: Statistically significant difference
p-value >= 0.05: No significant difference (modes are comparable)

Generated by: engine.py:134-147 (_permutation_pvalue)

Visualization Plots

Three PNG plots visualizing trade-offs between resources and accuracy.

latency_vs_accuracy.png

X-axis: Preprocessing latency (seconds)
Y-axis: Model accuracy (R²)
Color: Compute constraint (viridis colormap) Use case: Identify configurations with optimal latency-accuracy trade-off

memory_vs_accuracy.png

X-axis: Peak memory usage (MB)
Y-axis: Model accuracy (R²)
Color: Memory limit (plasma colormap) Use case: Select memory-efficient configurations without sacrificing accuracy

latency_memory_accuracy.png

X-axis: Peak memory (MB)
Y-axis: Latency (seconds)
Color: Model accuracy (coolwarm colormap) Use case: Three-way trade-off analysis for deployment decisions Generated by: engine.py:511-555 (_plot_experiment_results)

Profiling Artifacts

operator_profile.csv

Per-operator timing breakdown for each streaming chunk. Location: profiles/operator_profile.csv Columns:

Column	Type	Description
`chunk_id`	int	Chunk identifier
`preprocess_s`	float	Data cleaning time
`feature_engineering_s`	float	Feature derivation time
`feature_selection_s`	float	Multicollinearity removal time
`encode_scale_s`	float	Encoding and scaling time
`estimated_input_bandwidth_mb_s`	float	Input I/O bandwidth
`input_bytes`	int	Raw input size

Generated by: engine.py:481-492 (extracted from chunk_metrics) Example analysis:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('profiles/operator_profile.csv')

# Plot operator timing distribution
operators = ['preprocess_s', 'feature_engineering_s', 'feature_selection_s', 'encode_scale_s']
df[operators].boxplot()
plt.ylabel('Time (seconds)')
plt.title('Operator Timing Distribution')
plt.savefig('operator_timing_boxplot.png')

Use cases:

Identify bottleneck operators
Validate operator-level optimizations
Estimate per-stage latency for SLA planning

Intermediate Artifacts

Spilled Chunks

When spill_to_disk: true, intermediate feature matrices are persisted to CSV. Location: intermediate/ Files:

stream_chunk_{id}_X.csv: Feature matrix (X)
stream_chunk_{id}_y.csv: Target variable (y)

Format:

X: CSV with feature columns (encoded, scaled)
y: Single-column CSV with header salary

Generated by: engine.py:260-266

if self.config.spill_to_disk:
    x_path = self.config.output_dir / 'intermediate' / f'stream_chunk_{chunk_id}_X.csv'
    y_path = self.config.output_dir / 'intermediate' / f'stream_chunk_{chunk_id}_y.csv'
    X_chunk.to_csv(x_path, index=False)
    y_chunk.to_frame('salary').to_csv(y_path, index=False)

Use cases:

Resume interrupted runs (manual reassembly)
Debug feature engineering issues
Validate encoding consistency across chunks

Spilled chunks can accumulate quickly. A 1340-row dataset with chunk_size: 64 generates ~21 chunk pairs (~5-10 MB total). Clean intermediate/ directory periodically.

Artifact Reference Table

Artifact	Format	Size (typical)	Purpose	Generated by
`pipeline_report.json`	JSON	50-200 KB	Complete run summary	`engine.py:473-476`
`streaming_chunks.jsonl`	JSONL	10-50 KB	Per-chunk metrics	`engine.py:493-495`
`run_manifest.json`	JSON	1-5 KB	Reproducibility check	`engine.py:478-479`
`constraint_experiment.csv`	CSV	1-5 KB	Resource sweep results	`engine.py:502-503`
`significance_tests.csv`	CSV	<1 KB	Statistical tests	`engine.py:500`
`operator_profile.csv`	CSV	5-20 KB	Operator timing	`engine.py:481-492`
`latency_vs_accuracy.png`	PNG	50-100 KB	Latency trade-off plot	`engine.py:512-525`
`memory_vs_accuracy.png`	PNG	50-100 KB	Memory trade-off plot	`engine.py:527-540`
`latency_memory_accuracy.png`	PNG	50-100 KB	Three-way trade-off plot	`engine.py:542-555`
`stream_chunk_*_X.csv`	CSV	1-5 KB each	Spilled features	`engine.py:264`
`stream_chunk_*_y.csv`	CSV	<1 KB each	Spilled targets	`engine.py:265`

Artifact Lifecycle

Creation

All artifacts are written at the end of run_all() execution:

# From engine.py:454
self._write_artifacts(report)

Retention

Recommendations:

Keep indefinitely:
- pipeline_report.json (small, high value)
- run_manifest.json (reproducibility)
- constraint_experiment.csv (comparative analysis)
Archive after 30 days:
- streaming_chunks.jsonl (detailed, but large)
- operator_profile.csv (profiling data)
- Visualization PNGs (regenerable from CSVs)
Delete after run:
- intermediate/ directory (temporary spill files)

Cleanup

# Clean intermediate files only
rm -rf artifacts_*/intermediate/

# Archive old runs
tar -czf archive_$(date +%Y%m%d).tar.gz artifacts_*
rm -rf artifacts_*

# Selective cleanup (keep reports, remove profiles)
find artifacts_* -name "operator_profile.csv" -delete
find artifacts_* -name "streaming_chunks.jsonl" -delete

Working with Artifacts

Python Analysis

import json
import pandas as pd

# Load main report
with open('artifacts_server/reports/pipeline_report.json') as f:
    report = json.load(f)

print(f"Dataset: {report['dataset_fingerprint']['row_count']} rows")
print(f"Batch R²: {report['batch']['model']['r2']:.3f}")
print(f"Streaming R²: {report['streaming']['model']['r2']:.3f}")

# Load constraint experiments
exp = pd.read_csv('artifacts_server/benchmarks/constraint_experiment.csv')
print(exp.describe())

# Load streaming chunks
chunks = pd.read_csv('artifacts_server/benchmarks/streaming_chunks.csv')
print(f"Total chunks: {len(chunks)}")
print(f"Memory exceeded: {chunks['memory_exceeded'].sum()} times")

Command-Line Analysis

# Extract key metrics
jq '.batch.model.r2, .streaming.model.r2' \
  artifacts_server/reports/pipeline_report.json

# Check reproducibility
jq '.reproducibility.random_seed, .dataset_fingerprint.sha256' \
  artifacts_server/metadata/run_manifest.json

# Summarize constraint experiments
csv-stat artifacts_server/benchmarks/constraint_experiment.csv

# Count memory exceeded events
grep -c '"memory_exceeded": true' \
  artifacts_server/reports/streaming_chunks.jsonl

Best Practices

Use timestamped output directories

Avoid overwriting artifacts from previous runs:

python run_pipeline.py \
  --input data.csv \
  --output-dir artifacts_$(date +%Y%m%d_%H%M%S)

Validate fingerprints across runs

Ensure reproducibility by comparing SHA-256 hashes:

jq '.dataset_fingerprint.sha256' artifacts_*/reports/pipeline_report.json | sort -u
# Should output single hash if runs are identical

Archive to external storage

Configure external storage for long-term retention:

aws s3 sync artifacts_server/ s3://my-bucket/nba-pipeline/$(date +%Y%m%d)/

Clean intermediate files immediately

If spill_to_disk is enabled, clean up after successful runs:

python run_pipeline.py --config configs/pipeline.edge.template.json
rm -rf artifacts_edge/intermediate/

Edge Device Deployment - Resource-constrained artifacts
Server Deployment - High-throughput artifacts
Hardware Profiling - Deep dive into telemetry

Get Started

Core Concepts

Pipeline Stages

Configuration

Performance

Deployment

Overview

Directory Structure

Core Reports

pipeline_report.json

streaming_chunks.jsonl

run_manifest.json

Benchmark Artifacts

constraint_experiment.csv

significance_tests.csv

Visualization Plots

latency_vs_accuracy.png

memory_vs_accuracy.png

latency_memory_accuracy.png

Profiling Artifacts

operator_profile.csv

Intermediate Artifacts

Spilled Chunks

Artifact Reference Table

Artifact Lifecycle

Creation

Retention

Cleanup

Working with Artifacts

Python Analysis

Command-Line Analysis

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

Pipeline Stages

Configuration

Performance

Deployment

Documentation Index

​Overview

​Directory Structure

​Core Reports

​pipeline_report.json

​streaming_chunks.jsonl

​run_manifest.json

​Benchmark Artifacts

​constraint_experiment.csv

​significance_tests.csv

​Visualization Plots

​latency_vs_accuracy.png

​memory_vs_accuracy.png

​latency_memory_accuracy.png

​Profiling Artifacts

​operator_profile.csv

​Intermediate Artifacts

​Spilled Chunks

​Artifact Reference Table

​Artifact Lifecycle

​Creation

​Retention

​Cleanup

​Working with Artifacts

​Python Analysis

​Command-Line Analysis

​Best Practices

​Related Documentation

Build docs developers (and LLMs) love

Overview

Directory Structure

Core Reports

pipeline_report.json

streaming_chunks.jsonl

run_manifest.json

Benchmark Artifacts

constraint_experiment.csv

significance_tests.csv

Visualization Plots

latency_vs_accuracy.png

memory_vs_accuracy.png

latency_memory_accuracy.png

Profiling Artifacts

operator_profile.csv

Intermediate Artifacts

Spilled Chunks

Artifact Reference Table

Artifact Lifecycle

Creation

Retention

Cleanup

Working with Artifacts

Python Analysis

Command-Line Analysis

Best Practices

Related Documentation