Skip to main content

Class Signature

class ChronosBoltDataset(BaseDataset):
    def __init__(
        name: str = None,
        datetime_col: str = "ds",
        path: str = None,
        boundaries: list = [0, 0, 0],
        batch_size: int = 16,
        mode: str = None,
        stride: int = 10,
        context_len: int = 512,
        horizon_len: int = 64
    )
Dataset wrapper for preparing time-series data for the Chronos-Bolt model.

Parameters

name
str
default:"None"
Dataset name (e.g., “ett”, “etth1”). Used to automatically load datasets from predefined sources.
datetime_col
str
default:"ds"
Name of the datetime column in the CSV file.
path
str
default:"None"
Path to the CSV file containing time-series data.
boundaries
list[int]
default:"[0, 0, 0]"
Train/val/test split boundaries. If [0,0,0], defaults are computed automatically.
batch_size
int
default:"16"
Batch size for dataloaders.
mode
str
default:"None"
Dataset mode: “train” or “test”.
stride
int
default:"10"
Stride for windowing when creating sequences.
context_len
int
default:"512"
Length of historical context to use for forecasting.
horizon_len
int
default:"64"
Forecast horizon length (number of future steps to predict).

Attributes

max_col_num
int
default:"64"
Maximum number of columns (channels) supported.
one_chunk_num
int
Number of samples per chunk, computed as (length_timeseries - context_len - horizon_len) // stride + 1.

Methods

get_data_loader()

Returns a PyTorch DataLoader for the dataset.
def get_data_loader()
return
DataLoader
PyTorch DataLoader configured with the specified batch size.

len()

Returns the number of samples in the dataset.
def __len__()
return
int
Total number of samples available.

Usage Example

from samay.dataset import ChronosBoltDataset

# Create dataset for training
train_dataset = ChronosBoltDataset(
    name="ett",
    path="data/ETTh1.csv",
    datetime_col="date",
    mode="train",
    context_len=512,
    horizon_len=96,
    batch_size=32,
    stride=10
)

# Create dataset for testing
test_dataset = ChronosBoltDataset(
    name="ett",
    path="data/ETTh1.csv",
    datetime_col="date",
    mode="test",
    context_len=512,
    horizon_len=96,
    batch_size=32
)

# Get dataloader
dataloader = train_dataset.get_data_loader()

print(f"Dataset size: {len(train_dataset)}")
print(f"Batches per epoch: {len(dataloader)}")

Notes

  • The dataset automatically handles train/test splitting based on boundaries
  • Windowing is applied with the specified stride to create overlapping sequences
  • Maximum of 64 channels supported by default
  • Data is read from CSV and preprocessed automatically

Build docs developers (and LLMs) love