Skip to main content
Dataset management provides full lifecycle control over blockchain data extraction configurations. Operators can register dataset manifests with version tags, deploy datasets to start extraction jobs, inspect registered datasets, and restore dataset metadata from object storage.

Key Concepts

Dataset Reference

Datasets are identified as namespace/name@version (e.g., ethereum/mainnet@1.0.0)
Version Tags:
  • Semantic versions - User-defined versions like 1.0.0, 2.1.3
  • latest - System-managed, points to highest semantic version
  • dev - Development tag for testing, manually updated
Dataset Lifecycle:
  1. Register - Store manifest and create version tag
  2. Deploy - Schedule extraction job to sync data
  3. Monitor - Track job progress and sync state
  4. Restore - Recover metadata from storage if needed

Core Operations

Register

Store dataset manifests with version tags

Deploy

Start extraction jobs for data sync

List

View all registered datasets

Inspect

Get detailed dataset information

Register Datasets

Register a manifest as a named dataset with version tagging:
# Register a dataset (updates "dev" tag)
ampctl dataset register ethereum/mainnet ./manifest.json

# Register and tag with semantic version
ampctl dataset register ethereum/mainnet ./manifest.json --tag 1.0.0

# Short alias
ampctl dataset reg ethereum/mainnet ./manifest.json -t 1.0.0

# Load manifest from object storage
ampctl dataset register ethereum/mainnet s3://bucket/manifest.json --tag 2.0.0
Response:
{
  "manifest_hash": "abc123...",
  "version": "1.0.0"
}

Deploy Datasets

Schedule an extraction job to sync blockchain data:
# Deploy with continuous syncing (default)
ampctl dataset deploy ethereum/mainnet@1.0.0

# Stop at latest block at deployment time
ampctl dataset deploy ethereum/mainnet@1.0.0 --end-block latest

# Stop at specific block number
ampctl dataset deploy ethereum/mainnet@1.0.0 --end-block 5000000

# Stay 100 blocks behind chain tip
ampctl dataset deploy ethereum/mainnet@1.0.0 --end-block -100

# Run with multiple parallel workers
ampctl dataset deploy ethereum/mainnet@1.0.0 --parallelism 4

# Assign to specific worker
ampctl dataset deploy ethereum/mainnet@1.0.0 --worker-id my-worker
Response:
{
  "job_id": 12345
}

List Datasets

View all registered datasets:
# List all datasets
ampctl dataset list
ampctl dataset ls  # alias

# JSON output for scripting
ampctl dataset list --json
Response:
{
  "datasets": [
    {
      "namespace": "ethereum",
      "name": "mainnet",
      "latest_version": "2.0.0",
      "versions": ["2.0.0", "1.0.0"]
    }
  ]
}

Inspect Datasets

Get detailed information about a specific dataset version:
# Inspect latest version
ampctl dataset inspect ethereum/mainnet

# Inspect specific version
ampctl dataset inspect ethereum/mainnet@1.2.0

# Inspect dev version
ampctl dataset inspect ethereum/mainnet@dev

# Using alias
ampctl dataset get ethereum/mainnet@latest

# JSON output
ampctl dataset inspect ethereum/mainnet --json | jq '.kind'
Response:
{
  "namespace": "ethereum",
  "name": "mainnet",
  "revision": "1.0.0",
  "manifest_hash": "abc123...",
  "kind": "evm-rpc",
  "start_block": 0,
  "finalized_blocks_only": false,
  "tables": ["blocks", "transactions", "logs"]
}

List Dataset Versions

View all available versions for a dataset:
# List all versions
ampctl dataset versions ethereum/mainnet
Response:
{
  "versions": [
    {
      "version": "2.0.0",
      "manifest_hash": "def456...",
      "created_at": "2026-03-01T10:00:00Z",
      "updated_at": "2026-03-01T10:00:00Z"
    },
    {
      "version": "1.0.0",
      "manifest_hash": "abc123...",
      "created_at": "2026-01-01T10:00:00Z",
      "updated_at": "2026-01-01T10:00:00Z"
    }
  ],
  "special_tags": {
    "latest": "2.0.0",
    "dev": "abc123..."
  }
}

View Dataset Manifest

Retrieve the raw manifest JSON:
# Latest version manifest
ampctl dataset manifest ethereum/mainnet

# Specific version
ampctl dataset manifest ethereum/mainnet@1.2.0

# Save to file
ampctl dataset manifest ethereum/mainnet > manifest.json

Restore from Storage

Re-index dataset metadata from existing data in object storage:
# Restore all tables for a version
ampctl dataset restore ethereum/mainnet@1.0.0

# Restore latest version
ampctl dataset restore ethereum/mainnet@latest

# Restore single table (discovers latest revision)
ampctl dataset restore ethereum/mainnet@1.0.0 --table blocks

# Restore with specific location ID
ampctl dataset restore ethereum/mainnet@1.0.0 --table blocks --location-id 42

# JSON output
ampctl dataset restore ethereum/mainnet@1.0.0 --json
When to use restore:
  • Recovery after metadata database loss
  • Setting up new system with pre-existing data
  • Re-syncing after storage restoration
  • Migrating data from another system

Restore from Custom Storage Path

For data at non-default storage paths (e.g., after migration or importing from another system):
# Step 1: Register the dataset manifest
ampctl dataset register ethereum/mainnet ./manifest.json --tag 1.0.0

# Step 2: Register each table revision with custom path
ampctl table register ethereum/mainnet@1.0.0 blocks custom/path/to/blocks
# → location_id: 42

ampctl table register ethereum/mainnet@1.0.0 transactions custom/path/to/transactions
# → location_id: 43

# Step 3: Restore file metadata from storage
ampctl table restore 42
ampctl table restore 43

# Step 4: Activate each restored table
ampctl dataset restore ethereum/mainnet@1.0.0 --table blocks --location-id 42
ampctl dataset restore ethereum/mainnet@1.0.0 --table transactions --location-id 43

List Dataset Jobs

View all jobs for a specific dataset:
# Jobs are listed via job commands
ampctl job list --json | jq '.jobs[] | select(.dataset == "ethereum/mainnet@1.0.0")'
Response:
{
  "jobs": [
    {
      "id": 12345,
      "status": "RUNNING",
      "dataset": "ethereum/mainnet@1.0.0",
      "worker_id": "worker-01",
      "created_at": "2026-03-04T10:00:00Z"
    }
  ]
}

API Reference

Dataset management endpoints:
EndpointMethodDescription
/datasetsGETList all datasets
/datasetsPOSTRegister a dataset
/datasets/{namespace}/{name}/versionsGETList dataset versions
/datasets/{namespace}/{name}/versions/{revision}GETGet dataset details
/datasets/{namespace}/{name}/versions/{revision}/deployPOSTDeploy dataset
/datasets/{namespace}/{name}/versions/{revision}/manifestGETGet manifest JSON
/datasets/{namespace}/{name}/versions/{revision}/restorePOSTRestore dataset
/datasets/{namespace}/{name}/versions/{revision}/jobsGETList dataset jobs
For complete API schemas, see the Admin API OpenAPI specification.

Next Steps

Job Management

Monitor and control extraction jobs

Worker Management

Monitor worker health and availability

Build docs developers (and LLMs) love