ReasoningVLA

Class: ReasoningVLA

The ReasoningVLA class is the base model for reasoning-enabled Vision-Language-Action tasks. It combines a vision-language model (VLM) backbone with trajectory tokenization capabilities. Inherits from: PreTrainedModel, TrajectoryFusionMixin Location: alpamayo_r1.models.base_model.ReasoningVLA

Constructor

ReasoningVLA(
    config: ReasoningVLAConfig,
    pretrained_modules: dict[str, torch.nn.Module] | None = None,
    original_vocab_size: int | None = None,
    print_param_count: bool = True,
)

Initializes the ReasoningVLA base model with VLM backbone and trajectory tokenizers.

config

ReasoningVLAConfig

required

Configuration object containing VLM settings, trajectory tokenizer configurations, and model parameters.

pretrained_modules

dict[str, torch.nn.Module] | None

default:"None"

Dictionary of pretrained PyTorch modules. Can include:

"vlm": Pretrained vision-language model
"traj_tokenizer": Pretrained trajectory tokenizer

original_vocab_size

int | None

default:"None"

Original vocabulary size of the VLM before adding trajectory tokens.

print_param_count

bool

default:"True"

Whether to log total and trainable parameter counts during initialization.

Class Methods

from_pretrained_submodules

@classmethod
def from_pretrained_submodules(
    cls,
    config: ReasoningVLAConfig,
) -> "ReasoningVLA"

Load the model with pretrained submodules from HuggingFace.

config

ReasoningVLAConfig

required

Configuration object specifying the VLM to load and tokenizer settings.

model

ReasoningVLA

Initialized ReasoningVLA model with pretrained VLM backbone and tokenizers loaded from the paths specified in config.

Instance Methods

fuse_traj_tokens

def fuse_traj_tokens(
    input_ids: torch.Tensor,
    traj_data: dict[str, Any] | None = None
) -> torch.Tensor

Fuse trajectory tokens into the input token IDs by replacing placeholder tokens with encoded trajectory tokens.

input_ids

torch.Tensor

required

Input token IDs tensor with shape [B, n_token] containing placeholder trajectory tokens.

traj_data

dict[str, Any] | None

default:"None"

Dictionary containing trajectory data:

ego_history_xyz: History positions [B, n_traj, T, 3]
ego_history_rot: History rotations [B, n_traj, T, ...]
ego_future_xyz: (Optional) Future positions
ego_future_rot: (Optional) Future rotations

input_ids

torch.Tensor

Input IDs with trajectory placeholder tokens replaced by actual encoded trajectory tokens. Shape: [B, n_token]

get_input_embeddings

def get_input_embeddings() -> torch.nn.Module

Get the input embeddings layer of the model.

embeddings

torch.nn.Module

The embedding layer from the VLM’s language model.

get_output_embeddings

def get_output_embeddings() -> torch.nn.Module

Get the output embeddings (LM head) of the model.

embeddings

torch.nn.Module

The output embedding layer from the VLM.

tie_weights

def tie_weights() -> None

Tie input and output embeddings if configured. Delegates to the VLM backbone’s tie_weights method.

Attributes

vlm

torch.nn.Module

The vision-language model backbone (e.g., Qwen3VLForConditionalGeneration).

tokenizer

AutoTokenizer

Tokenizer with trajectory tokens and special tokens added.

traj_tokenizer

torch.nn.Module | None

Trajectory tokenizer for encoding future trajectories to discrete tokens.

hist_traj_tokenizer

torch.nn.Module | None

Trajectory tokenizer for encoding history trajectories. Defaults to traj_tokenizer if not separately configured.

special_token_ids

dict[str, int]

Mapping of special token names to their token IDs.

original_vocab_size

int

Original vocabulary size before adding trajectory tokens.

Example Usage

import torch
from alpamayo_r1.models.base_model import ReasoningVLA, ReasoningVLAConfig

# Initialize configuration
config = ReasoningVLAConfig(
    vlm_name_or_path="Qwen/Qwen3-VL-8B-Instruct",
    traj_vocab_size=768,
    tokens_per_history_traj=16,
    tokens_per_future_traj=64,
)

# Load model with pretrained submodules
model = ReasoningVLA.from_pretrained_submodules(config)

# Prepare trajectory data
traj_data = {
    "ego_history_xyz": torch.randn(2, 1, 10, 3),
    "ego_history_rot": torch.randn(2, 1, 10, 4),
}

# Create input with trajectory placeholders
input_ids = torch.randint(0, 1000, (2, 100))

# Fuse trajectory tokens
input_ids_with_traj = model.fuse_traj_tokens(input_ids, traj_data)

print(f"Input shape: {input_ids_with_traj.shape}")
print(f"Tokenizer vocab size: {len(model.tokenizer)}")

Special Tokens

The model adds the following special tokens to the tokenizer:

<|traj_history|>: History trajectory placeholder
<|traj_future|>: Future trajectory placeholder
<|traj_history_start|>: History trajectory start marker
<|traj_history_end|>: History trajectory end marker
<|traj_future_start|>: Future trajectory start marker
<|traj_future_end|>: Future trajectory end marker

Additional special tokens for chain-of-thought, meta-actions, and other structured outputs are available when add_special_tokens=True in the config.

Notes

The model automatically resizes the VLM’s token embeddings to accommodate trajectory tokens
Trajectory tokens are discrete tokens of the form <i0>, <i1>, …, <i{vocab_size-1}>
The TrajectoryFusionMixin provides the fuse_traj_tokens functionality
Currently supports Qwen3-VL as the VLM backend

Models

Action Space

Diffusion

Utilities

Class: ReasoningVLA

Constructor

Class Methods

from_pretrained_submodules

Instance Methods

fuse_traj_tokens

get_input_embeddings

get_output_embeddings

tie_weights

Attributes

Example Usage

Special Tokens

Notes

Build docs developers (and LLMs) love

Models

Action Space

Diffusion

Utilities

Documentation Index

​Class: ReasoningVLA

​Constructor

​Class Methods

​from_pretrained_submodules

​Instance Methods

​fuse_traj_tokens

​get_input_embeddings

​get_output_embeddings

​tie_weights

​Attributes

​Example Usage

​Special Tokens

​Notes

Build docs developers (and LLMs) love

Class: ReasoningVLA

Constructor

Class Methods

from_pretrained_submodules

Instance Methods

fuse_traj_tokens

get_input_embeddings

get_output_embeddings

tie_weights

Attributes

Example Usage

Special Tokens

Notes