Time-MoE

Overview

Time-MoE is a Mixture of Experts (MoE) model specifically designed for time series forecasting. It uses a sparse gating mechanism to route inputs to specialized experts, enabling efficient scaling. Paper: Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Configuration

Time-MoE configuration is loaded from a JSON file:

from samay.utils import load_args

arg_path = "config/timemoe.json"
args = load_args(arg_path)

Example configuration:

{
    "repo": "Maple728/TimeMoE-50M",
    "context_len": 512,
    "horizon_len": 96
}

Loading the Model

from samay.model import TimeMoEModel
from samay.utils import load_args

arg_path = "config/timemoe.json"
args = load_args(arg_path)
model = TimeMoEModel(**args)

Loading Dataset

For Training (Fine-tuning)
For Evaluation

from samay.dataset import TimeMoEDataset

train_dataset = TimeMoEDataset(
    name="ett",
    datetime_col="date",
    path="./data/ETTh1.csv",
    mode="train",
    batch_size=32,
    context_len=512,
    horizon_len=96,
    task_name="finetune"
)

val_dataset = TimeMoEDataset(
    name="ett",
    datetime_col="date",
    path="./data/ETTh1.csv",
    mode="test",
    batch_size=128,
    context_len=512,
    horizon_len=96
)

Note the task_name="finetune" parameter for training datasets.

Zero-Shot Forecasting

metrics = model.evaluate(val_dataset, metric_only=True)
print(metrics)
# {'mse': ..., 'mae': ..., 'mase': ..., 'mape': ..., ...}

Fine-tuning

Time-MoE fine-tuning is currently under development. The example notebook shows an error during fine-tuning.

# Note: Fine-tuning implementation is being improved
model.finetune(train_dataset)

Visualization

model.plot(val_dataset)

Mixture of Experts Architecture

Time-MoE uses a Mixture of Experts approach:

Sparse Gating: Routes inputs to a subset of experts
Specialized Experts: Each expert learns different patterns
Efficient Scaling: Increases capacity without proportional compute increase
Load Balancing: Ensures even distribution across experts

Key Features

Scalable: Efficiently scales to billion-parameter models
Sparse Activation: Only a subset of parameters active per input
Specialized Learning: Different experts capture different patterns
Efficient Inference: Despite large size, inference is efficient

Model Variants

Model	Parameters	Description
TimeMoE-50M	50M	Small variant for quick experiments
TimeMoE-200M	200M	Medium variant balancing size and performance
TimeMoE-1B	1B	Large variant for maximum accuracy

When to Use Time-MoE

When you need state-of-the-art accuracy
For diverse time series patterns in your data
When you have computational resources for larger models
For complex forecasting tasks requiring specialized expertise

Example Notebook

For a complete working example, see:

example/timemoe.ipynb

The Time-MoE model is actively being improved. Check the repository for the latest updates.

Get Started

Core Concepts

Models

Guides

Overview

Configuration

Loading the Model

Loading Dataset

Zero-Shot Forecasting

Fine-tuning

Visualization

Mixture of Experts Architecture

Key Features

Model Variants

When to Use Time-MoE

Example Notebook

Build docs developers (and LLMs) love

Get Started

Core Concepts

Models

Guides

​Overview

​Configuration

​Loading the Model

​Loading Dataset

​Zero-Shot Forecasting

​Fine-tuning

​Visualization

​Mixture of Experts Architecture

​Key Features

​Model Variants

​When to Use Time-MoE

​Example Notebook

Build docs developers (and LLMs) love

Overview

Configuration

Loading the Model

Loading Dataset

Zero-Shot Forecasting

Fine-tuning

Visualization

Mixture of Experts Architecture

Key Features

Model Variants

When to Use Time-MoE

Example Notebook