AlpamayoR1Config
Configuration class for the AlpamayoR1 expert model. Inherits from:ReasoningVLAConfig
Location: alpamayo_r1.config.AlpamayoR1Config
Constructor
Configuration dictionary for the diffusion model. Used to instantiate the diffusion sampling process via Hydra.
Configuration dictionary for the action space. Defines how trajectories are encoded/decoded.
Configuration dictionary for the action input projection layer. Maps noisy actions to expert token embeddings.
Configuration dictionary for the action output projection layer. Maps expert hidden states to action predictions.
Configuration dictionary for the expert language model. Overrides default settings from the VLM’s text config.
Whether to convert action-related modules (diffusion, projections) to the same dtype as the expert model.
Whether to use non-causal attention in the expert model during trajectory generation.
Additional keyword arguments passed to the parent
ReasoningVLAConfig class. See ReasoningVLAConfig for inherited parameters.Example Usage
ReasoningVLAConfig
Base configuration class for Reasoning VLA models. Inherits from:PretrainedConfig
Location: alpamayo_r1.models.base_model.ReasoningVLAConfig
Constructor
HuggingFace model identifier or local path to the pretrained vision-language model.
VLM backend type. Currently supports
"qwenvl3" for Qwen3-VL models.Configuration dictionary for the trajectory tokenizer (for future trajectories). Used with Hydra instantiation.
Configuration dictionary for the history trajectory tokenizer. If not provided, uses
traj_tokenizer_cfg.Vocabulary size for discrete trajectory tokens. Determines the number of trajectory tokens
<i0> through <i{vocab_size-1}> to add.Number of tokens used to encode each history trajectory.
Number of tokens used to encode each future trajectory.
Data type for model weights. Supported values:
"float32", "float16", "bfloat16".Attention implementation to use. Options include
"flash_attention_2", "sdpa", or "eager".Minimum number of pixels for image processing. Passed to the VLM processor.
Maximum number of pixels for image processing. Passed to the VLM processor.
Whether to add extended special tokens beyond basic trajectory tokens. Includes tokens for chain-of-thought, meta-actions, etc.
Additional keyword arguments passed to the parent
PretrainedConfig class.Attributes
After initialization, the config object has these computed attributes:Total vocabulary size including original tokens and added trajectory tokens.
Starting token ID for trajectory tokens in the vocabulary.
Mapping of trajectory token names to their token IDs:
"history": History trajectory placeholder token ID"future": Future trajectory placeholder token ID"history_start": History start marker token ID"history_end": History end marker token ID"future_start": Future start marker token ID"future_end": Future end marker token ID
Example Usage
Notes
- The configuration automatically initializes by loading the processor from the specified VLM path
- Trajectory tokens are added to the tokenizer during config initialization
- The
vocab_sizeincludes both the original VLM vocabulary and added trajectory tokens - Configuration can be saved and loaded using standard HuggingFace
PretrainedConfigmethods