MLIP Arena is built around a layered pipeline: models expose a common ASE Calculator interface, tasks apply individual operations to structures, flows orchestrate tasks in parallel across models, benchmarks collect and upload results, and a live leaderboard displays them on Hugging Face Spaces.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/atomind-ai/mlip-arena/llms.txt
Use this file to discover all available pages before exploring further.
Pipeline overview
Layers in detail
Models
Every supported MLIP is wrapped as an ASECalculator subclass and registered in mlip_arena/models/registry.yaml. At import time mlip_arena/models/__init__.py reads the registry, dynamically imports each model class, and builds MLIPEnum — a Python Enum where each member’s value is its calculator class.
Models fall into two categories:
- External ASE calculators — implemented under
mlip_arena/models/externals/. These wrap third-party packages (e.g.,mace-torch,chgnet,matgl) and expose an ASECalculatorinterface. - HuggingFace models — inherit
MLIP(which extendsnn.ModuleandPyTorchModelHubMixin), enabling checkpoint upload and download via the Hub.
Tasks
A task is one operation on one input structure decorated with Prefect’s@task. Each task:
- Accepts an
atoms: Atomsobject and acalculator: BaseCalculator. - Uses
TASK_SOURCE + INPUTScache policy so identical work is not repeated. - Returns a dictionary of results (relaxed structure, energies, trajectory data, etc.).
EOS internally calls OPT for full relaxation followed by a series of constrained OPT tasks at different volumes.
Flows
A flow wraps multiple task calls under a Prefect@flow and uses .submit() to dispatch them concurrently to workers. Flows are what you run in production on an HPC cluster or locally with a Prefect agent.
Benchmarks
Benchmarks are Python scripts (or Jupyter notebooks) underbenchmarks/ that build a flow over all MLIPEnum members, collect results, and upload them to the atomind/mlip-arena HuggingFace dataset repository.
Leaderboard
serve/app.py is a Streamlit application hosted on Hugging Face Spaces. It reads result data from the dataset repository and renders interactive benchmark pages for each task registered in mlip_arena/tasks/registry.yaml.
Prefect workflow orchestration
MLIP Arena uses Prefect as its workflow engine. Prefect provides:Task caching
Results are cached by
TASK_SOURCE + INPUTS policy. Re-running a benchmark skips already-completed calculations.Parallel execution
.submit() dispatches tasks to a Prefect worker pool, enabling concurrent execution across models and structures.HPC integration
dask_jobqueue integrates with SLURM, PBS, and other schedulers for cluster-scale parallelism.Observability
The Prefect UI tracks task states, logs, and failure reasons for every benchmark run.
HuggingFace integration
MLIP Arena uses three HuggingFace surfaces:| Surface | Purpose | Key operation |
|---|---|---|
| Model repos | Store pretrained MLIP checkpoints | MLIP.from_pretrained(repo_id) |
Dataset repo (atomind/mlip-arena) | Store benchmark results as JSON | HfApi.upload_file() |
Spaces (atomind/mlip-arena) | Host the Streamlit leaderboard | streamlit run serve/app.py |
ASE Calculator abstraction
All models expose a unified interface through ASE’sCalculator base class. This means any task written against BaseCalculator works with any registered model without modification.
Registry pattern
Both models and tasks use a YAML registry as a single source of truth for metadata.- Model registry
- Task registry
mlip_arena/models/registry.yaml stores per-model metadata: Python module path, class name, model family, training datasets, supported tasks, prediction types, and license.At import time, __init__.py reads this file and imports each class:Adding a new model or benchmark does not require changing Python code in the core library — only the relevant YAML registry needs updating.