Configuration Classes

AlpamayoR1Config

Configuration class for the AlpamayoR1 expert model. Inherits from: ReasoningVLAConfig Location: alpamayo_r1.config.AlpamayoR1Config

Constructor

AlpamayoR1Config(
    diffusion_cfg: dict[str, Any] | None = None,
    action_space_cfg: dict[str, Any] | None = None,
    action_in_proj_cfg: dict[str, Any] | None = None,
    action_out_proj_cfg: dict[str, Any] | None = None,
    expert_cfg: dict[str, Any] | None = None,
    keep_same_dtype: bool = True,
    expert_non_causal_attention: bool = True,
    **kwargs: Any,
)

Initializes configuration for the AlpamayoR1 model.

diffusion_cfg

dict[str, Any] | None

default:"None"

Configuration dictionary for the diffusion model. Used to instantiate the diffusion sampling process via Hydra.

action_space_cfg

dict[str, Any] | None

default:"None"

Configuration dictionary for the action space. Defines how trajectories are encoded/decoded.

action_in_proj_cfg

dict[str, Any] | None

default:"None"

Configuration dictionary for the action input projection layer. Maps noisy actions to expert token embeddings.

action_out_proj_cfg

dict[str, Any] | None

default:"None"

Configuration dictionary for the action output projection layer. Maps expert hidden states to action predictions.

expert_cfg

dict[str, Any] | None

default:"None"

Configuration dictionary for the expert language model. Overrides default settings from the VLM’s text config.

keep_same_dtype

bool

default:"True"

Whether to convert action-related modules (diffusion, projections) to the same dtype as the expert model.

expert_non_causal_attention

bool

default:"True"

Whether to use non-causal attention in the expert model during trajectory generation.

**kwargs

Any

Additional keyword arguments passed to the parent ReasoningVLAConfig class. See ReasoningVLAConfig for inherited parameters.

Example Usage

from alpamayo_r1.config import AlpamayoR1Config

config = AlpamayoR1Config(
    # Base VLA configuration
    vlm_name_or_path="Qwen/Qwen3-VL-8B-Instruct",
    traj_vocab_size=768,
    tokens_per_history_traj=16,
    tokens_per_future_traj=64,
    model_dtype="bfloat16",
    
    # AlpamayoR1-specific configuration
    diffusion_cfg={
        "_target_": "alpamayo_r1.diffusion.FlowMatching",
        "num_inference_steps": 10,
    },
    action_space_cfg={
        "_target_": "alpamayo_r1.action_space.UnicycleAccelCurvatureActionSpace",
    },
    action_in_proj_cfg={
        "_target_": "alpamayo_r1.models.action_in_proj.PerWaypointActionInProjV2",
        "num_enc_layers": 4,
        "hidden_size": 1024,
    },
    action_out_proj_cfg={
        "_target_": "torch.nn.Linear",
    },
    expert_cfg={
        "num_hidden_layers": 32,
    },
    keep_same_dtype=True,
    expert_non_causal_attention=True,
)

ReasoningVLAConfig

Base configuration class for Reasoning VLA models. Inherits from: PretrainedConfig Location: alpamayo_r1.models.base_model.ReasoningVLAConfig

Constructor

ReasoningVLAConfig(
    vlm_name_or_path: str = "Qwen/Qwen3-VL-8B-Instruct",
    vlm_backend: str = "qwenvl3",
    traj_tokenizer_cfg: dict[str, Any] | None = None,
    hist_traj_tokenizer_cfg: dict[str, Any] | None = None,
    traj_vocab_size: int = 768,
    tokens_per_history_traj: int = 16,
    tokens_per_future_traj: int = 64,
    model_dtype: str = "bfloat16",
    attn_implementation: str = "flash_attention_2",
    min_pixels: int | None = None,
    max_pixels: int | None = None,
    add_special_tokens: bool = False,
    **kwargs: Any,
)

Initializes base configuration for ReasoningVLA models.

vlm_name_or_path

str

default:"'Qwen/Qwen3-VL-8B-Instruct'"

HuggingFace model identifier or local path to the pretrained vision-language model.

vlm_backend

str

default:"'qwenvl3'"

VLM backend type. Currently supports "qwenvl3" for Qwen3-VL models.

traj_tokenizer_cfg

dict[str, Any] | None

default:"None"

Configuration dictionary for the trajectory tokenizer (for future trajectories). Used with Hydra instantiation.

hist_traj_tokenizer_cfg

dict[str, Any] | None

default:"None"

Configuration dictionary for the history trajectory tokenizer. If not provided, uses traj_tokenizer_cfg.

traj_vocab_size

int

default:"768"

Vocabulary size for discrete trajectory tokens. Determines the number of trajectory tokens <i0> through <i{vocab_size-1}> to add.

tokens_per_history_traj

int

default:"16"

Number of tokens used to encode each history trajectory.

tokens_per_future_traj

int

default:"64"

Number of tokens used to encode each future trajectory.

model_dtype

str

default:"'bfloat16'"

Data type for model weights. Supported values: "float32", "float16", "bfloat16".

attn_implementation

str

default:"'flash_attention_2'"

Attention implementation to use. Options include "flash_attention_2", "sdpa", or "eager".

min_pixels

int | None

default:"None"

Minimum number of pixels for image processing. Passed to the VLM processor.

max_pixels

int | None

default:"None"

Maximum number of pixels for image processing. Passed to the VLM processor.

add_special_tokens

bool

default:"False"

Whether to add extended special tokens beyond basic trajectory tokens. Includes tokens for chain-of-thought, meta-actions, etc.

**kwargs

Any

Additional keyword arguments passed to the parent PretrainedConfig class.

Attributes

After initialization, the config object has these computed attributes:

vocab_size

int

Total vocabulary size including original tokens and added trajectory tokens.

traj_token_start_idx

int

Starting token ID for trajectory tokens in the vocabulary.

traj_token_ids

dict[str, int]

Mapping of trajectory token names to their token IDs:

"history": History trajectory placeholder token ID
"future": Future trajectory placeholder token ID
"history_start": History start marker token ID
"history_end": History end marker token ID
"future_start": Future start marker token ID
"future_end": Future end marker token ID

Example Usage

from alpamayo_r1.models.base_model import ReasoningVLAConfig

config = ReasoningVLAConfig(
    vlm_name_or_path="Qwen/Qwen3-VL-8B-Instruct",
    vlm_backend="qwenvl3",
    traj_vocab_size=768,
    tokens_per_history_traj=16,
    tokens_per_future_traj=64,
    model_dtype="bfloat16",
    attn_implementation="flash_attention_2",
    min_pixels=256 * 28 * 28,
    max_pixels=1280 * 28 * 28,
    add_special_tokens=True,
    traj_tokenizer_cfg={
        "_target_": "alpamayo_r1.trajectory_tokenizer.VQTrajectoryTokenizer",
        "load_weights": True,
    },
)

print(f"Total vocab size: {config.vocab_size}")
print(f"Trajectory token start index: {config.traj_token_start_idx}")
print(f"Trajectory token IDs: {config.traj_token_ids}")

Notes

The configuration automatically initializes by loading the processor from the specified VLM path
Trajectory tokens are added to the tokenizer during config initialization
The vocab_size includes both the original VLM vocabulary and added trajectory tokens
Configuration can be saved and loaded using standard HuggingFace PretrainedConfig methods

Models

Action Space

Diffusion

Utilities

Configuration Classes

AlpamayoR1Config

Constructor

Example Usage

ReasoningVLAConfig

Constructor

Attributes

Example Usage

Notes

Build docs developers (and LLMs) love

Models

Action Space

Diffusion

Utilities

Documentation Index

​AlpamayoR1Config

​Constructor

​Example Usage

​ReasoningVLAConfig

​Constructor

​Attributes

​Example Usage

​Notes

Build docs developers (and LLMs) love

AlpamayoR1Config

Constructor

Example Usage

ReasoningVLAConfig

Constructor

Attributes

Example Usage

Notes