Skip to main content

Overview

Dataset wrapper for MOMENT (Multi-task and Multi-domain) model, supporting forecasting, imputation, anomaly detection, and classification tasks.

Class signature

class MomentDataset(BaseDataset):
    def __init__(
        self,
        name: str = None,
        datetime_col: str = None,
        path: str = None,
        batchsize: int = 64,
        mode: str = "train",
        boundaries: list = [0, 0, 0],
        horizon_len: int = 0,
        task_name: str = "forecasting",
        label_col: str = None,
        stride: int = 10,
        **kwargs,
    )

Parameters

name
str
default:"None"
Name of the dataset.
datetime_col
str
default:"None"
Name of the datetime column.
path
str
default:"None"
Path to CSV file.
batchsize
int
default:"64"
Batch size for DataLoader.
mode
str
default:"train"
Mode of operation: 'train' or 'test'.
boundaries
list
default:"[0, 0, 0]"
Train/val/test split boundaries. Default splits as: 50% train, 20% val, 30% test.
horizon_len
int
default:"0"
Forecast horizon length.
task_name
str
default:"forecasting"
Task type: 'forecasting', 'imputation', 'detection', or 'classification'.
label_col
str
default:"None"
Column name for labels in classification. Defaults to 'label' if not provided.
stride
int
default:"10"
Stride for windowing. In test mode with horizon_len > 0, stride is set to horizon_len.
kwargs
dict
Extra options forwarded to DataLoader.

Methods

__len__()

Get the total number of data samples.
def __len__(self) -> int
Returns: int - Number of samples available for iteration.

__getitem__(index)

Get a data sample by index.
def __getitem__(self, index: int)
Parameters:
  • index (int): Index of the data sample.
Returns: Depending on the task, returns input sequences, masks, forecasts, or labels.

get_data_loader()

Get a DataLoader for the dataset.
def get_data_loader()
Returns: DataLoader - PyTorch DataLoader for the dataset.

_denormalize_data(data)

Denormalize the input data.
def _denormalize_data(self, data: np.ndarray)
Parameters:
  • data (np.ndarray): Input data array.
Returns: np.ndarray - Denormalized data array.

Example usage

from samay.dataset import MomentDataset

# Forecasting task
dataset = MomentDataset(
    path="data/timeseries.csv",
    datetime_col="timestamp",
    task_name="forecasting",
    horizon_len=96,
    batchsize=32,
    mode="train"
)

loader = dataset.get_data_loader()
for batch in loader:
    input_seq, input_mask, forecast_seq = batch
    # Training logic here

Task-specific outputs

Forecasting

Returns: (input_seq, input_mask, forecast_seq)
  • input_seq: Input sequence of shape (n_channels, seq_len)
  • input_mask: Mask indicating valid values (1) vs padded values (0)
  • forecast_seq: Target forecast sequence

Imputation

Returns: (input_seq, input_mask)

Detection

Returns: (input_seq, input_mask, labels)
  • labels: Binary labels for anomaly detection

Classification

Returns: (input_seq, input_mask, labels)
  • labels: Class labels for classification

Features

  • Automatic data scaling using StandardScaler
  • Support for multivariate time series (max 64 channels per chunk)
  • Automatic padding for short sequences
  • Chunking for datasets with many channels

Build docs developers (and LLMs) love