ampctl datasets

Overview

Dataset commands provide full lifecycle management for blockchain datasets. Operators can register manifests as named datasets with version tags, deploy datasets to start extraction jobs, inspect registered datasets and their versions, retrieve raw manifests, and restore dataset metadata from object storage after recovery scenarios.

Key Concepts

Dataset Reference: Identifies a dataset as namespace/name@version (e.g., ethereum/[email protected])
Version Tag: Semantic version (e.g., 1.0.0) or special tags latest and dev (system-managed)
Deployment: Scheduling an extraction job that syncs blockchain data for a dataset
Restore: Re-indexing dataset metadata from existing data in object storage

Commands

ampctl dataset register

ampctl dataset register <DATASET_REF> <MANIFEST_PATH> [OPTIONS]

Aliases: reg

DATASET_REF

string

required

Dataset reference in format namespace/name (e.g., ethereum/mainnet)

MANIFEST_PATH

string

required

Path to manifest JSON file (local path or object storage URL like s3://bucket/manifest.json)

--tag, -t

string

Version tag for the dataset (e.g., 1.0.0). Without this flag, only the dev tag is updated.

Examples:

# Register a dataset (updates "dev" tag)
ampctl dataset register my_namespace/my_dataset ./manifest.json

# Register and tag with a semantic version
ampctl dataset register my_namespace/my_dataset ./manifest.json --tag 1.0.0

# Using the alias
ampctl dataset reg my_namespace/my_dataset ./manifest.json -t 1.0.0

# Manifest can be loaded from object storage
ampctl dataset register my_namespace/my_dataset s3://bucket/manifest.json --tag 2.0.0

ampctl dataset deploy

Deploy a dataset to start extraction.

ampctl dataset deploy <DATASET_REF> [OPTIONS]

DATASET_REF

string

required

Dataset reference in format namespace/name@version (e.g., ethereum/[email protected])

--end-block

string

Stop extraction at a specific block. Options:

latest - Stop at the latest block at deployment time
<number> - Stop at a specific block number (e.g., 5000000)
<negative> - Stay N blocks behind chain tip (e.g., -100)
If not specified, syncing runs continuously

--parallelism

number

Number of parallel workers to use for extraction

--worker-id

string

Assign the job to a specific worker node by ID

Examples:

# Deploy with continuous syncing (default)
ampctl dataset deploy my_namespace/[email protected]

# Stop at the latest block at deployment time
ampctl dataset deploy my_namespace/[email protected] --end-block latest

# Stop at a specific block number
ampctl dataset deploy my_namespace/[email protected] --end-block 5000000

# Stay 100 blocks behind chain tip
ampctl dataset deploy my_namespace/[email protected] --end-block -100

# Run with multiple parallel workers
ampctl dataset deploy my_namespace/[email protected] --parallelism 4

# Assign to a specific worker
ampctl dataset deploy my_namespace/[email protected] --worker-id my-worker

ampctl dataset list

List all registered datasets.

ampctl dataset list [OPTIONS]

Aliases: ls Examples:

ampctl dataset list
ampctl dataset ls  # alias
ampctl dataset list --json  # JSON output

Output:

namespace/dataset1 (latest: 1.2.0, versions: 1.0.0, 1.1.0, 1.2.0)
namespace/dataset2 (latest: 2.0.0, versions: 1.0.0, 2.0.0)

ampctl dataset inspect

Inspect a specific dataset version.

ampctl dataset inspect <DATASET_REF> [OPTIONS]

Aliases: get

DATASET_REF

string

required

Dataset reference in format namespace/name[@version]. If version is omitted, defaults to latest.

Examples:

# Inspect latest version
ampctl dataset inspect my_namespace/my_dataset

# Inspect a specific version
ampctl dataset inspect my_namespace/[email protected]

# Inspect the dev version
ampctl dataset inspect my_namespace/my_dataset@dev

# Using the alias
ampctl dataset get my_namespace/my_dataset@latest

# Extract specific fields with jq
ampctl dataset inspect my_namespace/my_dataset --json | jq '.kind'

ampctl dataset versions

List all versions for a dataset.

ampctl dataset versions <DATASET_REF>

DATASET_REF

string

required

Dataset reference in format namespace/name

Examples:

ampctl dataset versions my_namespace/my_dataset
ampctl dataset versions my_namespace/my_dataset --json

Output:

Version    Manifest Hash                                            Created At
0.0      abc123def456...                                          2024-01-15T10:30:00Z
1.0      def789ghi012...                                          2024-02-20T14:45:00Z
2.0      ghi345jkl678...                                          2024-03-01T09:15:00Z

ampctl dataset manifest

Retrieve the raw manifest JSON for a dataset version.

ampctl dataset manifest <DATASET_REF>

DATASET_REF

string

required

Dataset reference in format namespace/name[@version]. If version is omitted, defaults to latest.

Examples:

# Latest version manifest
ampctl dataset manifest my_namespace/my_dataset

# Specific version
ampctl dataset manifest my_namespace/[email protected]

# Save to file
ampctl dataset manifest my_namespace/[email protected] > manifest.json

ampctl dataset restore

Restore dataset metadata from object storage.

ampctl dataset restore <DATASET_REF> [OPTIONS]

DATASET_REF

string

required

Dataset reference in format namespace/name@version

--table

string

Restore only a specific table (discovers latest revision from storage)

--location-id

number

Restore a specific table with a specific location ID

Use Cases:

Recovery after metadata loss
Setting up new systems with pre-existing data
Re-syncing after storage restoration

Examples:

# Restore all tables for a specific version
ampctl dataset restore my_namespace/[email protected]

# Restore latest version
ampctl dataset restore my_namespace/my_dataset@latest

# Restore a single table (discovers latest revision from storage)
ampctl dataset restore my_namespace/[email protected] --table blocks

# Restore a single table with a specific location ID
ampctl dataset restore my_namespace/[email protected] --table blocks --location-id 42

Advanced Workflow: Restore from Custom Storage Path

When table data exists at non-default storage paths (e.g., after migration, custom storage layout, or importing data from another system), use this multi-step flow:

# Step 1: Register the dataset manifest (if not already registered)
ampctl dataset register my_namespace/my_dataset ./manifest.json --tag 1.0.0

# Step 2: Register each table revision with a custom storage path
#   Creates an inactive revision record and returns a location_id
ampctl table register my_namespace/[email protected] blocks custom/path/to/blocks
# → location_id: 42
ampctl table register my_namespace/[email protected] transactions custom/path/to/transactions
# → location_id: 43

# Step 3: Restore file metadata from storage for each table
#   Scans the storage path and indexes Parquet file metadata into the metadata DB
ampctl table restore 42
ampctl table restore 43

# Step 4: Activate each restored table revision via dataset restore
ampctl dataset restore my_namespace/[email protected] --table blocks --location-id 42
ampctl dataset restore my_namespace/[email protected] --table transactions --location-id 43

JSON Output

All dataset commands support JSON output for scripting:

ampctl dataset list --json
ampctl dataset inspect my_namespace/my_dataset --json | jq '.kind'
ampctl dataset restore my_namespace/[email protected] --json

ampd

ampctl

ampup

ampctl datasets

Overview

Key Concepts

Commands

ampctl dataset register

ampctl dataset deploy

ampctl dataset list

ampctl dataset inspect

ampctl dataset versions

ampctl dataset manifest

ampctl dataset restore

Advanced Workflow: Restore from Custom Storage Path

JSON Output

Build docs developers (and LLMs) love

ampd

ampctl

ampup

​Overview

​Key Concepts

​Commands

​ampctl dataset register

​ampctl dataset deploy

​ampctl dataset list

​ampctl dataset inspect

​ampctl dataset versions

​ampctl dataset manifest

​ampctl dataset restore

​Advanced Workflow: Restore from Custom Storage Path

​JSON Output

Build docs developers (and LLMs) love

Overview

Key Concepts

Commands

ampctl dataset register

ampctl dataset deploy

ampctl dataset list

ampctl dataset inspect

ampctl dataset versions

ampctl dataset manifest

ampctl dataset restore

Advanced Workflow: Restore from Custom Storage Path

JSON Output