Skip to main content

Overview

Dataset class compatible with TimesFM (Time Series Foundation Model) for time series forecasting.

Class signature

class TimesfmDataset(BaseDataset):
    def __init__(
        self,
        name: str = None,
        datetime_col: str = "ds",
        path: str = None,
        batchsize: int = 4,
        mode: str = "train",
        boundaries: tuple = (0, 0, 0),
        context_len: int = 128,
        horizon_len: int = 32,
        freq: str = "h",
        normalize: bool = False,
        stride: int = 10,
        **kwargs,
    )

Parameters

name
str
default:"None"
Dataset name used to locate data.
datetime_col
str
default:"ds"
Datetime column name in the CSV.
path
str
default:"None"
Path to CSV file. If None, loader from BaseDataset will be used.
batchsize
int
default:"4"
Batch size for dataloaders.
mode
str
default:"train"
Mode of use: "train" or "test".
boundaries
tuple
default:"(0, 0, 0)"
Train/val/test split boundaries. Default splits as: 50% train, 20% val, 30% test.
context_len
int
default:"128"
Historical context length.
horizon_len
int
default:"32"
Forecast horizon length.
freq
str
default:"h"
Data frequency code (e.g., "h" for hourly, "d" for daily, "m" for monthly).
normalize
bool
default:"False"
Whether to normalize input features.
stride
int
default:"10"
Stride used when creating windows from the timeseries.
kwargs
dict
Extra backend-specific options.

Methods

get_data_loader()

Get a DataLoader for the dataset.
def get_data_loader()
Returns: DataLoader - DataLoader for the dataset.

preprocess_train_batch(data)

Preprocess a training batch.
def preprocess_train_batch(self, data: tuple)
Parameters:
  • data (tuple): Input data tuple.
Returns: dict - Preprocessed data dictionary with keys 'input_ts' and 'actual_ts'.

preprocess_eval_batch(data)

Preprocess an evaluation batch.
def preprocess_eval_batch(self, data: tuple)
Parameters:
  • data (tuple): Input data tuple.
Returns: dict - Preprocessed data dictionary with keys 'input_ts' and 'actual_ts'.

preprocess(data)

Preprocess the input data.
def preprocess(self, data: tuple)
Parameters:
  • data (tuple): Input data tuple.
Returns: dict - Preprocessed data dictionary.

_denormalize_data(data)

Denormalize the input data.
def _denormalize_data(self, data: np.ndarray)
Parameters:
  • data (np.ndarray): Input data array.
Returns: np.ndarray - Denormalized data array.

Example usage

from samay.dataset import TimesfmDataset

dataset = TimesfmDataset(
    path="data/hourly_data.csv",
    datetime_col="timestamp",
    context_len=256,
    horizon_len=64,
    freq="h",
    normalize=True,
    mode="train"
)

loader = dataset.get_data_loader()
for batch in loader:
    input_ts = batch['input_ts']
    actual_ts = batch['actual_ts']
    # Training logic here

Notes

  • The dataset automatically adjusts horizon length to be at most 30% of the data length
  • Supports special boundary values: (-1, -1, -1) uses all data for training
  • When normalize=True, a StandardScaler is fitted on the training data

Build docs developers (and LLMs) love