Overview
Dataset wrapper for Chronos, a probabilistic time series forecasting model. Supports both seq2seq and causal model types.Class signature
Parameters
Dataset name.
Datetime column name.
Path to CSV file.
Train/val/test split boundaries. Default splits as: 50% train, 20% val, 30% test.
Batch size for dataloaders.
'train' or 'test'.Stride for windowing.
Tokenizer class name used by Chronos.
Dropout probability for seq2seq creation. Set to 0.0 for causal models.
Minimum past context length.
Numpy dtype for arrays.
Chronos configuration object. If None, default configuration is used.
Methods
__len__()
Get the total number of data samples.
int - Number of samples available for iteration.
__getitem__(index)
Get a data sample by index.
index(int): Index of the data sample.
dict - Data sample containing:
input_seq: Input sequenceforecast_seq: Forecast sequenceinput_ids: Tokenized input IDsattention_mask: Attention masklabels: Target labels (with -100 for masked positions)
get_data_loader()
Get a data loader for the dataset.
DataLoader - PyTorch DataLoader for the dataset.
preprocess()
Preprocess the data by applying dropout if in training mode.
drop_prob.
Example usage
Default configuration
If no config is provided, the following defaults are used:Features
- Support for both seq2seq and causal model types
- Automatic tokenization using MeanScaleUniformBins
- Chunking for multivariate time series (max 16 channels per chunk)
- Automatic padding for short sequences
- Dropout regularization for training