Skip to main content

Overview

Dataset wrapper for Chronos, a probabilistic time series forecasting model. Supports both seq2seq and causal model types.

Class signature

class ChronosDataset(BaseDataset):
    def __init__(
        self,
        name: str = None,
        datetime_col: str = "ds",
        path: str = None,
        boundaries: list = [0, 0, 0],
        batch_size: int = 16,
        mode: str = None,
        stride: int = 10,
        tokenizer_class: str = "MeanScaleUniformBins",
        drop_prob: float = 0.2,
        min_past: int = 64,
        np_dtype: np.dtype = np.float32,
        config: ChronosConfig = None,
    )

Parameters

name
str
default:"None"
Dataset name.
datetime_col
str
default:"ds"
Datetime column name.
path
str
default:"None"
Path to CSV file.
boundaries
list
default:"[0, 0, 0]"
Train/val/test split boundaries. Default splits as: 50% train, 20% val, 30% test.
batch_size
int
default:"16"
Batch size for dataloaders.
mode
str
default:"None"
'train' or 'test'.
stride
int
default:"10"
Stride for windowing.
tokenizer_class
str
default:"MeanScaleUniformBins"
Tokenizer class name used by Chronos.
drop_prob
float
default:"0.2"
Dropout probability for seq2seq creation. Set to 0.0 for causal models.
min_past
int
default:"64"
Minimum past context length.
np_dtype
np.dtype
default:"np.float32"
Numpy dtype for arrays.
config
ChronosConfig
default:"None"
Chronos configuration object. If None, default configuration is used.

Methods

__len__()

Get the total number of data samples.
def __len__(self) -> int
Returns: int - Number of samples available for iteration.

__getitem__(index)

Get a data sample by index.
def __getitem__(self, index: int)
Parameters:
  • index (int): Index of the data sample.
Returns: dict - Data sample containing:
  • input_seq: Input sequence
  • forecast_seq: Forecast sequence
  • input_ids: Tokenized input IDs
  • attention_mask: Attention mask
  • labels: Target labels (with -100 for masked positions)

get_data_loader()

Get a data loader for the dataset.
def get_data_loader()
Returns: DataLoader - PyTorch DataLoader for the dataset.

preprocess()

Preprocess the data by applying dropout if in training mode.
def preprocess()
Applies random dropout to training data based on drop_prob.

Example usage

from samay.dataset import ChronosDataset
from samay.models.chronosforecasting.chronos.chronos import ChronosConfig

config = ChronosConfig(
    tokenizer_class="MeanScaleUniformBins",
    tokenizer_kwargs={"low_limit": -15.0, "high_limit": 15.0},
    n_tokens=4096,
    context_length=512,
    prediction_length=64,
    model_type="seq2seq"
)

dataset = ChronosDataset(
    path="data/timeseries.csv",
    datetime_col="date",
    config=config,
    batch_size=16,
    mode="train"
)

loader = dataset.get_data_loader()
for batch in loader:
    input_ids = batch['input_ids']
    attention_mask = batch['attention_mask']
    labels = batch['labels']
    # Training logic here

Default configuration

If no config is provided, the following defaults are used:
ChronosConfig(
    tokenizer_class="MeanScaleUniformBins",
    tokenizer_kwargs={"low_limit": -15.0, "high_limit": 15.0},
    n_tokens=4096,
    n_special_tokens=2,
    pad_token_id=0,
    eos_token_id=1,
    use_eos_token=True,
    model_type="seq2seq",
    context_length=512,
    prediction_length=64,
    num_samples=20,
    temperature=1.0,
    top_k=50,
    top_p=1.0,
)

Features

  • Support for both seq2seq and causal model types
  • Automatic tokenization using MeanScaleUniformBins
  • Chunking for multivariate time series (max 16 channels per chunk)
  • Automatic padding for short sequences
  • Dropout regularization for training

Build docs developers (and LLMs) love