Documentation Index Fetch the complete documentation index at: https://mintlify.com/huggingface/lerobot/llms.txt
Use this file to discover all available pages before exploring further.
This guide provides solutions to common problems you might encounter while using LeRobot.
Installation Issues
Python version incompatibility
Problem: Import errors or syntax errors after installation.Solution: LeRobot requires Python ≥3.12. Check your version:python --version
# Should show Python 3.12.x or higher
If you have an older version, create a new environment: conda create -y -n lerobot python= 3.12
conda activate lerobot
pip install lerobot
ffmpeg not found or missing libsvtav1
Problem: Errors like ffmpeg not found or Encoder 'libsvtav1' not found.Solution: Install ffmpeg with libsvtav1 support:# With conda (recommended)
conda install ffmpeg= 7.1.1 -c conda-forge
# Verify installation
ffmpeg -version
ffmpeg -encoders | grep svt
ffmpeg 8.X is not yet supported. Use version 7.X.
WSL (Windows) installation issues
Problem: Errors related to evdev or input devices on Windows Subsystem for Linux.Solution: Install evdev explicitly:conda install evdev -c conda-forge
Problem: Permission errors when installing packages.Solution: Don’t use sudo with pip. Instead:# Use virtual environment (recommended)
conda create -n lerobot python= 3.12
conda activate lerobot
pip install lerobot
# Or install for user only
pip install --user lerobot
GPU and CUDA Issues
CUDA out of memory errors
Problem: RuntimeError: CUDA out of memory during training or inference.Solutions:
Reduce batch size:
lerobot-train \
--policy=act \
--dataset.repo_id=lerobot/pusht \
--training.batch_size=8 # Try smaller values
Enable gradient accumulation:
lerobot-train \
--policy=act \
--dataset.repo_id=lerobot/pusht \
--training.batch_size=4 \
--training.gradient_accumulation_steps=4
Use mixed precision (AMP):
policy.config.use_amp = True
Clear CUDA cache:
import torch
torch.cuda.empty_cache()
Use a smaller model variant or reduce sequence length
Problem: torch.cuda.is_available() returns False.Solutions:
Check NVIDIA driver:
Reinstall PyTorch with CUDA:
# For CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Verify installation:
import torch
print ( f "CUDA available: { torch.cuda.is_available() } " )
print ( f "CUDA version: { torch.version.cuda } " )
print ( f "GPU: { torch.cuda.get_device_name( 0 ) } " )
Multi-GPU training issues
Problem: Errors when using multiple GPUs with DDP.Solution: Use torchrun with correct configuration:# For 4 GPUs
torchrun --nproc_per_node=4 -m lerobot.scripts.train \
--policy=act \
--dataset.repo_id=lerobot/pusht
# Ensure consistent batch size across GPUs
# Total batch size = batch_size * num_gpus
Dataset Issues
Problem: FileNotFoundError or DatasetNotFoundError when loading dataset.Solutions:
Verify dataset exists:
from huggingface_hub import list_datasets
datasets = [d.id for d in list_datasets( task_categories = "robotics" , tags = [ "LeRobot" ])]
print ( "Available datasets:" , datasets)
Check authentication (for private datasets):
Use correct repo_id format:
# Correct
dataset = LeRobotDataset( "lerobot/pusht" )
# Incorrect
dataset = LeRobotDataset( "pusht" ) # Missing namespace
Problem: Errors when loading video frames from dataset.Solutions:
Verify ffmpeg installation:
ffmpeg -version
ffmpeg -decoders | grep h264
Clear dataset cache and re-download:
rm -rf ~/.cache/huggingface/lerobot/ < dataset-nam e >
Check disk space:
df -h ~/.cache/huggingface
Problem: Dataset loading takes too long.Solutions:
Use streaming for large datasets:
dataset = LeRobotDataset(
"lerobot/aloha_mobile_cabinet" ,
streaming = True # Don't download entire dataset
)
Increase number of workers:
from torch.utils.data import DataLoader
dataloader = DataLoader(
dataset,
batch_size = 32 ,
num_workers = 8 # Increase for faster loading
)
Cache dataset locally for repeated use
Problem: Inconsistent data or errors after dataset updates.Solution: Clear the dataset cache:# Remove specific dataset
rm -rf ~/.cache/huggingface/lerobot/ < dataset-nam e >
# Or clear all cached datasets
rm -rf ~/.cache/huggingface/lerobot/ *
Training Issues
Training loss not decreasing
Problem: Loss plateaus or doesn’t decrease during training.Solutions:
Check learning rate:
# Try different learning rates
config.training.lr = 1e-4 # Default
config.training.lr = 1e-3 # Higher for faster learning
config.training.lr = 1e-5 # Lower for stability
Verify data normalization:
# Check dataset statistics
print (dataset.meta.stats)
Increase training steps:
lerobot-train \
--policy=act \
--dataset.repo_id=lerobot/pusht \
--training.num_steps=200000 # More steps
Check for data issues (e.g., all actions similar)
Problem: Loss becomes NaN or Inf during training.Solutions:
Reduce learning rate:
config.training.lr = 1e-5 # Lower LR
Enable gradient clipping:
config.training.grad_clip_norm = 1.0
Check for numerical instability in custom code
Verify dataset doesn’t contain NaN values:
import torch
batch = next ( iter (dataloader))
print ( "NaN in batch:" , torch.isnan(batch[ 'action' ]).any())
Checkpoint loading errors
Problem: Cannot resume training from checkpoint.Solutions:
Verify checkpoint path:
ls outputs/train/my_checkpoint/
# Should contain: config.yaml, checkpoint_*.pth
Check version compatibility:
# Model from old version may not be compatible
# Try loading with strict=False
policy.load_state_dict(checkpoint, strict = False )
Ensure config matches:
The checkpoint config must match your current training config
Robot Hardware Issues
Problem: Cannot connect to robot.Solutions:
Check device permissions:
# For USB devices
sudo chmod 666 /dev/ttyUSB0 # Or your device
# Add user to dialout group (permanent)
sudo usermod -a -G dialout $USER
# Log out and back in for changes to take effect
Verify device path:
# List USB devices
ls /dev/tty *
# Use correct path in config
robot = Robot ( port = "/dev/ttyUSB0" )
Check cable connections and power supply
Latency in real-time control
Problem: High latency causes jerky or delayed robot motion.Solutions:
Use GPU inference:
policy = policy.to( "cuda" )
Enable async inference:
See examples/tutorial/async-inf/ for policy server/client pattern
Optimize observation processing:
Reduce image resolution
Use hardware video encoding
Minimize preprocessing steps
Use action chunking (ACT-style policies reduce inference frequency)
Problem: Robot movements are offset or incorrect.Solutions:
Re-run calibration:
Follow your robot’s specific calibration procedure
Check for breaking changes:
See Backward Compatibility for migration guides
Verify joint limits in robot config
Test with known-good trajectory to isolate issue
Solutions:
Use GPU acceleration
Increase batch size (if memory allows)
Use more DataLoader workers:
dataloader = DataLoader(dataset, num_workers = 8 )
Enable AMP (automatic mixed precision):
Use multi-GPU training with DDP
Solutions:
Reduce batch size
Use gradient checkpointing:
config.use_gradient_checkpointing = True
Clear unused tensors:
del large_tensor
torch.cuda.empty_cache()
Use streaming datasets for large data
Error Messages Reference
'normalize_inputs' not found in state_dict
'Encoder libsvtav1 not found'
Cause: ffmpeg doesn’t have libsvtav1 encoder compiled.Solution: Install correct ffmpeg version:conda install ffmpeg= 7.1.1 -c conda-forge
ffmpeg -encoders | grep svt # Verify
'ImportError: cannot import name X'
Cause: Version mismatch between installed LeRobot and code.Solution: # Reinstall LeRobot
pip uninstall lerobot
pip install lerobot --upgrade
# Or reinstall from source
cd lerobot
pip install -e . --force-reinstall
Getting Help
If you can’t find a solution here:
Search GitHub Issues Check if your issue has been reported
Ask on Discord Get help from the community
Open an Issue Report a new bug with details
Discussions Ask questions and share ideas
Reporting Bugs
When reporting an issue, please include:
Environment Information
lerobot-info
python --version
nvidia-smi # If using GPU
Minimal Reproduction
Provide the smallest code snippet that reproduces the issue
Error Traceback
Include the full error message and stack trace
Expected vs Actual
Describe what you expected to happen and what actually happened
The more details you provide, the faster we can help you!