How Alpamayo 1 generates natural language reasoning traces for driving decisions
Chain-of-Causation (CoC) is Alpamayo 1’s approach to generating interpretable, step-by-step reasoning traces that explain why the model predicts a particular trajectory. Unlike black-box motion prediction models, Alpamayo 1 produces human-readable explanations alongside its driving outputs.
There is a pedestrian crossing the street ahead on the right side.The traffic light is red, requiring the ego vehicle to stop.The vehicle in the left lane is slowing down, indicating traffic congestion.The ego vehicle should decelerate smoothly and come to a complete stop before the crosswalk to yield to the pedestrian and obey the traffic signal.
This reasoning trace explains what the model observes and why it predicts a stopping trajectory.
Hybrid auto-labeling with human in the loop for reasoning traces
✅ Included
CoC reasoning traces in the training data were created through:
Auto-labeling: Automated generation of reasoning candidates
Human-in-the-loop: Human reviewers validate and refine reasoning quality
Supervision: Model is trained to generate these traces via standard language modeling loss
The current release focuses on supervised learning. The paper also describes RL post-training for improving reasoning quality, but this is not included in the v1.0 release.
extra = { "cot": np.ndarray, # Shape: [batch_size, num_traj_sets, num_traj_samples] # Each element is a string containing the CoC reasoning trace}
Multi-sample generation: When num_traj_samples > 1, each sample gets its own CoC trace due to stochastic sampling, providing diverse reasoning explanations.
RL post-training for reasoning quality improvement
Meta-actions or high-level behavior descriptions
General VQA (visual question answering) capabilities
Route conditioning or navigation-aware reasoning
From README FAQ (lines 117-120):
While the paper describes RL stages for improving reasoning quality and action consistency, this release focuses on the supervised learning components. We may release RL post-trained models in future releases.