donkey tubhist command creates histograms showing the distribution of recorded values (steering, throttle, etc.) in your tub data. This helps identify data imbalances, outliers, and overall data quality.
Usage
Options
Path(s) to tub directories to analyze. Multiple tubs can be specified:When multiple tubs are provided, their data is combined for the histogram.
Name of specific record field to create histogram for. Examples:
user/angle: Steering anglesuser/throttle: Throttle valuesuser/mode: Operating modes
Path where to save the histogram image (must end with
.png).If not specified, saves to a default location based on the tub name:- With
--record:<tub_name>_hist_<record_name>.png - Without
--record:<tub_name>_hist.png
What Gets Created
The command generates:- Interactive histogram window showing the distribution(s)
- PNG image file saved to the specified or default location
Histogram Features
- 50 bins by default for granular distribution view
- Separate subplots for each field when analyzing all records
- Combined data when multiple tubs are specified
- Automatic scaling for different value ranges
Examples
Analyze all fields in a tub
Analyze specific field (steering)
Analyze specific field (throttle)
Save to custom location
Analyze multiple tubs combined
Analyze mode distribution
Output Example
Interpreting Histograms
Steering (user/angle) Histogram
Ideal Distribution
- Balanced: Roughly equal left and right turns
- Centered peak: Most values near 0 (straight driving)
- Smooth tails: Gradual decrease toward extremes
- Full range: Values span from -1.0 to 1.0
Problem Patterns
Left/Right Bias- More values on one side than the other
- Indicates track turns more in one direction
- Solution: Record data driving in reverse direction
- Overwhelming number of straight driving samples
- Model may not learn turns well
- Solution: Include more turns in training data
- Few straight driving samples
- Model may struggle with straight sections
- Solution: Record more straight driving
- Missing ranges of steering values
- Model won’t learn those steering angles
- Solution: Drive with full range of steering inputs
- Many samples at maximum left/right
- May indicate overcorrection or calibration issues
- Solution: Review driving technique or recalibrate
Throttle (user/throttle) Histogram
Ideal Distribution
- Consistent forward values: Peak around cruise throttle
- Few stopped values: Minimal time at 0 throttle
- Minimal reverse: Unless intentionally training for reverse
Problem Patterns
Zero Spike- Many samples with 0 throttle
- Model may learn to stop frequently
- Solution: Remove stopped segments or balance data
- Values scattered across range
- Inconsistent speed
- Solution: Drive more smoothly at consistent speed
- All throttle values are low
- Model may be too cautious
- Solution: Include faster driving data
- Unexpected negative throttle
- May be unintentional backup
- Solution: Review and clean data
Use Cases
Data Quality Check
Before training, verify data distribution:Identify Data Imbalance
Check if you need to collect more data for specific scenarios:Compare Datasets
Analyze different tubs separately to compare:Validate Data Collection
Confirm you drove with full range of inputs:Debug Training Issues
If model performs poorly, check data distribution:Data Balancing Strategies
For Imbalanced Steering
- Record reverse direction: Drive the track backward
- Augmentation: Use horizontal flip in training config
- Weighted sampling: Configure training to oversample underrepresented angles
For Sparse Data Regions
- Targeted collection: Record specific scenarios (sharp turns, etc.)
- Multiple laps: Record more data of the same track
- Data synthesis: Use augmentation techniques
For Too Much Straight Driving
- Remove straight sections: Edit tubs to remove excess straight driving
- Focus on turns: Record more laps focusing on technical sections
- Use subset: Train only on data with abs(angle) > 0.1
Analysis Workflow
-
Collect initial data:
-
Check distribution:
-
Identify issues:
- Note imbalances
- Check for missing ranges
- Verify full steering range used
-
Collect targeted data:
- Record specific scenarios that are underrepresented
- Drive track in reverse if left/right imbalanced
-
Verify improvement:
-
Proceed to training:
Combining with Other Tools
Full Analysis Pipeline
Troubleshooting
Tub not found
- Verify tub path exists and is correct
- Check that tub contains valid data (manifest.json)
- Use absolute paths if relative paths fail
Empty or missing fields
- Verify record name is correct (use
user/angle, not justangle) - Check tub actually contains the specified field
- Look at a tub’s manifest.json to see available fields
Plot display issues
- On headless servers, may need to set matplotlib backend
- Use SSH with X11 forwarding:
ssh -X - Check that
$DISPLAYenvironment variable is set
Image not saving
- Verify output directory exists and is writable
- Check disk space
- Ensure path ends with
.png
AttributeError or DataFrame errors
- Update pandas:
pip install --upgrade pandas - Update matplotlib:
pip install --upgrade matplotlib - Verify tub format is compatible (v2 format)
Common Record Fields
Typical fields in Donkeycar tubs:| Field Name | Description | Typical Range |
|---|---|---|
user/angle | Steering angle | -1.0 to 1.0 |
user/throttle | Throttle value | -1.0 to 1.0 |
user/mode | Operating mode | ’user’, ‘local_angle’, ‘local’ |
pilot/angle | Autopilot steering | -1.0 to 1.0 |
pilot/throttle | Autopilot throttle | -1.0 to 1.0 |
milliseconds | Timestamp | Integer |
manifest.json file.
Tips
Before Training
- Always check histograms before training to avoid wasting time on bad data
- Look at both steering and throttle distributions
- Verify full range of values is present
- Check for outliers that might indicate recording errors
Data Collection Strategy
- Plan coverage: Aim for balanced distribution
- Record multiple sessions: Combine data from different times/conditions
- Monitor during collection: Check histograms after each session
- Quality over quantity: 1000 good balanced samples beats 10000 imbalanced ones
Iterative Improvement
- Baseline histogram: Record initial data distribution
- Identify gaps: Note underrepresented values
- Targeted collection: Focus on missing ranges
- Verify improvement: Re-run histogram
- Train and evaluate: See if balanced data improves model
Next Steps
After analyzing histograms:- Balance your data: Collect more data for underrepresented scenarios
- Clean your data: Remove or trim problematic sections
- Train your model: Use
donkey train - Evaluate predictions: Use
donkey tubplot - Visualize results: Use
donkey makemovie
