Overview
Time-MoE is a Mixture of Experts (MoE) model specifically designed for time series forecasting. It uses a sparse gating mechanism to route inputs to specialized experts, enabling efficient scaling. Paper: Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of ExpertsConfiguration
Time-MoE configuration is loaded from a JSON file:Loading the Model
Loading Dataset
- For Training (Fine-tuning)
- For Evaluation
Note the
task_name="finetune" parameter for training datasets.Zero-Shot Forecasting
Fine-tuning
Visualization
Mixture of Experts Architecture
Time-MoE uses a Mixture of Experts approach:- Sparse Gating: Routes inputs to a subset of experts
- Specialized Experts: Each expert learns different patterns
- Efficient Scaling: Increases capacity without proportional compute increase
- Load Balancing: Ensures even distribution across experts
Key Features
- Scalable: Efficiently scales to billion-parameter models
- Sparse Activation: Only a subset of parameters active per input
- Specialized Learning: Different experts capture different patterns
- Efficient Inference: Despite large size, inference is efficient
Model Variants
| Model | Parameters | Description |
|---|---|---|
| TimeMoE-50M | 50M | Small variant for quick experiments |
| TimeMoE-200M | 200M | Medium variant balancing size and performance |
| TimeMoE-1B | 1B | Large variant for maximum accuracy |
When to Use Time-MoE
- When you need state-of-the-art accuracy
- For diverse time series patterns in your data
- When you have computational resources for larger models
- For complex forecasting tasks requiring specialized expertise
Example Notebook
For a complete working example, see:The Time-MoE model is actively being improved. Check the repository for the latest updates.