Class: AlpamayoR1
TheAlpamayoR1 class is the main expert model for reasoning Vision-Language-Action tasks. It extends the ReasoningVLA base model with diffusion-based trajectory sampling capabilities.
Inherits from: ReasoningVLA
Location: alpamayo_r1.models.alpamayo_r1.AlpamayoR1
Constructor
Configuration object containing all model settings including expert configuration, diffusion settings, and action space parameters.
Dictionary of pretrained PyTorch modules to use for initialization. Can include pre-loaded components like the VLM backbone.
Original vocabulary size before adding trajectory tokens. Used when loading pretrained modules.
Methods
sample_trajectories_from_data_with_vlm_rollout
Input data dictionary containing:
ego_history_xyz: History positions tensor[B, n_traj_group, T, 3]ego_history_rot: History rotations tensor[B, n_traj_group, T, ...]tokenized_data: Tokenized input data includinginput_ids
Nucleus sampling parameter. Only tokens with cumulative probability up to
top_p are considered.Top-k sampling parameter. If specified, only the top k tokens are considered for sampling.
Sampling temperature. Higher values increase randomness, lower values make sampling more deterministic.
Number of trajectory samples to generate per input.
Number of trajectory sets to generate.
Additional keyword arguments to pass to the diffusion sampling process.
Additional keyword arguments:
max_generation_length: Maximum length for VLM generation (default:config.tokens_per_future_traj)return_extra: IfTrue, returns extracted text tokens in addition to trajectories
Predicted trajectory positions with shape
[B, num_traj_sets, num_traj_samples, T, 3]Predicted trajectory rotations with shape
[B, num_traj_sets, num_traj_samples, T, ...]Dictionary containing extracted text tokens from VLM generation, with shape
[B, num_traj_sets, num_traj_samples]Example Usage
Internal Components
The AlpamayoR1 model initializes the following internal components:- expert: Language model for processing trajectory embeddings (based on VLM text config)
- action_space: Action space handler for trajectory encoding/decoding
- diffusion: Diffusion model for trajectory sampling
- action_in_proj: Projects noisy actions to expert token embeddings
- action_out_proj: Projects expert hidden states to action predictions
Notes
- The model uses a two-stage process: VLM autoregressive generation followed by diffusion-based trajectory sampling
- During inference, only one trajectory group is supported (
n_traj_group == 1) - The expert model masks out discrete trajectory tokens during chain-of-thought generation
- KV cache from VLM generation is reused during expert model forward passes for efficiency