TLDR: Add a
meta/modality.json file to your LeRobot v2 dataset and follow the schema below.Directory structure
A GR00T LeRobot dataset follows this structure:Data components
Video observations (videos/chunk-*)
Video files are stored as MP4 files following the naming convention:Video observations for each episode.Requirements:
- Must be stored as MP4 files
- Named using format:
observation.images.<video_name> - Episode numbering:
episode_00000X.mp4where X is the episode number
Data files (data/chunk-*)
Parquet files containing state, action, and annotation data:Episode data in columnar format.Required columns:
observation.state- Concatenated 1D array of all state modalitiesaction- Concatenated 1D array of all action modalitiestimestamp- Float timestamp of the observationepisode_index- Episode numberindex- Global observation index across all episodesnext.reward- Reward of the next observationnext.done- Whether the episode is done
annotation.<source>.<type>.<name>- Annotation indices referencingtasks.jsonltask_index- Task index (legacy, use annotation columns instead)
Example parquet file
From thecube_to_bowl dataset:
Metadata files (meta/)
meta/tasks.jsonl
meta/tasks.jsonl
Contains task descriptions referenced by annotation indices in parquet files.Format:The
annotation.human.action.task_description column in the parquet file references the task_index to get the actual task description.meta/episodes.jsonl
meta/episodes.jsonl
Contains episode metadata including length and associated tasks.Format:
meta/info.json
meta/info.json
Contains general dataset information and metadata.
meta/modality.json (GR00T specific)
meta/modality.json (GR00T specific)
Required metadata file for GR00T that provides detailed field-level metadata.
See the schema section below for details.
GR00T-specific: meta/modality.json
Themeta/modality.json file is required for GR00T and provides detailed metadata about state and action modalities.
Purpose
This file enables:- Separate data storage and interpretation: State and action are stored as concatenated float32 arrays, with metadata to interpret them as distinct fields
- Fine-grained splitting: Divides arrays into semantically meaningful fields
- Clear mapping: Explicit mapping of data dimensions
- Sophisticated transformations: Field-specific normalization and rotation transformations during training
Schema
All indices are zero-based and follow Python’s array slicing convention (
[start:end]).Example modality.json
GR00T extensions to standard LeRobot
GR00T LeRobot extends the standard LeRobot format with:meta/stats.json and meta/relative_stats.json are automatically computed for each dataset and stored in the meta folder.Proprioceptive states must always be included in the
observation.state keys.Support for multiple annotation channels (e.g., coarse-grained, fine-tuned) via the
annotation.<source>.<type> key pattern.The
meta/modality.json file is required and not present in standard LeRobot v2.Multiple annotation support
GR00T supports multiple annotation channels within a single parquet file. Users can add extra columns following the pattern:How it works
- Language descriptions are stored in
meta/tasks.jsonlwith atask_index - Parquet files store only the corresponding index in the annotation column
- The data loader uses the index to retrieve the actual text from
tasks.jsonl
This is the same pattern as LeRobot v2’s
task_index column, but more flexible with dedicated annotation columns.Example
meta/tasks.jsonl:0 to “pick the squash and place it in the plate” and index 1 to “valid”.
Converting existing data
From LeRobot v3.0
If you have a dataset in the LeRobot v3.0 format, use the conversion script:From other formats
Convert your data to LeRobot v2 format following the structure requirements above, then add themeta/modality.json file.
VLAStepData structure
When working with GR00T’s data processing pipeline, data is represented using theVLAStepData class from gr00t/data/types.py:36-58:
Dictionary mapping view names to lists of image arrays (for temporal stacking).
Dictionary mapping state names to state arrays. Can be single step
(dim,) or trajectory (horizon, dim).Dictionary mapping action names to action arrays with shape
(horizon, dim) for action chunks.Optional task description or instruction for language conditioning.
Embodiment tag for cross-embodiment training. Defaults to
NEW_EMBODIMENT.Whether the step is a demonstration. If
True, no loss is computed for this step.Next steps
Modality configs
Configure how to load and process your data
Data preparation guide
Step-by-step guide to prepare your dataset
Embodiment tags
Learn about supported robots