donkey tubhist - Donkeycar

The donkey tubhist command creates histograms showing the distribution of recorded values (steering, throttle, etc.) in your tub data. This helps identify data imbalances, outliers, and overall data quality.

Usage

donkey tubhist [options]

Options

--tub

string[]

required

Path(s) to tub directories to analyze. Multiple tubs can be specified:

--tub ./data/tub_1 ./data/tub_2

When multiple tubs are provided, their data is combined for the histogram.

--record

string

Name of specific record field to create histogram for. Examples:

user/angle: Steering angles
user/throttle: Throttle values
user/mode: Operating modes

If not specified, creates histograms for all numeric fields.

--out

string

Path where to save the histogram image (must end with .png).If not specified, saves to a default location based on the tub name:

With --record: <tub_name>_hist_<record_name>.png
Without --record: <tub_name>_hist.png

What Gets Created

The command generates:

Interactive histogram window showing the distribution(s)
PNG image file saved to the specified or default location

Histogram Features

50 bins by default for granular distribution view
Separate subplots for each field when analyzing all records
Combined data when multiple tubs are specified
Automatic scaling for different value ranges

Examples

Analyze all fields in a tub

donkey tubhist --tub ./data/tub_1_20-03-15

Creates histograms for all numeric fields (steering, throttle, etc.).

Analyze specific field (steering)

donkey tubhist --tub ./data/tub_1_20-03-15 --record user/angle

Shows only the distribution of steering angles.

Analyze specific field (throttle)

donkey tubhist --tub ./data/tub_1_20-03-15 --record user/throttle

Save to custom location

donkey tubhist --tub ./data/tub_1_20-03-15 --record user/angle \
  --out ~/analysis/steering_distribution.png

Analyze multiple tubs combined

donkey tubhist --tub ./data/tub_1 ./data/tub_2 ./data/tub_3 \
  --record user/angle

Combines data from all three tubs into a single histogram.

Analyze mode distribution

donkey tubhist --tub ./data/tub_1_20-03-15 --record user/mode

Shows distribution of operating modes (user, local_angle, local).

Output Example

Loading tubs from paths: ./data/tub_1_20-03-15
Tub 1: 2,487 records

Creating histogram for: user/angle
Bins: 50
Range: -1.0 to 1.0

Saving image to: tub_1_20-03-15_hist_user_angle.png

The histogram window displays and the image is saved.

Interpreting Histograms

Steering (user/angle) Histogram

Ideal Distribution

Balanced: Roughly equal left and right turns
Centered peak: Most values near 0 (straight driving)
Smooth tails: Gradual decrease toward extremes
Full range: Values span from -1.0 to 1.0

Problem Patterns

Left/Right Bias

More values on one side than the other
Indicates track turns more in one direction
Solution: Record data driving in reverse direction

Center Spike

Overwhelming number of straight driving samples
Model may not learn turns well
Solution: Include more turns in training data

Missing Center

Few straight driving samples
Model may struggle with straight sections
Solution: Record more straight driving

Gaps or Discontinuities

Missing ranges of steering values
Model won’t learn those steering angles
Solution: Drive with full range of steering inputs

Extreme Clusters

Many samples at maximum left/right
May indicate overcorrection or calibration issues
Solution: Review driving technique or recalibrate

Throttle (user/throttle) Histogram

Ideal Distribution

Consistent forward values: Peak around cruise throttle
Few stopped values: Minimal time at 0 throttle
Minimal reverse: Unless intentionally training for reverse

Problem Patterns

Zero Spike

Many samples with 0 throttle
Model may learn to stop frequently
Solution: Remove stopped segments or balance data

High Variance

Values scattered across range
Inconsistent speed
Solution: Drive more smoothly at consistent speed

Low Values Only

All throttle values are low
Model may be too cautious
Solution: Include faster driving data

Reverse Values

Unexpected negative throttle
May be unintentional backup
Solution: Review and clean data

Use Cases

Data Quality Check

Before training, verify data distribution:

donkey tubhist --tub ./data/tub_new --record user/angle

Identify Data Imbalance

Check if you need to collect more data for specific scenarios:

donkey tubhist --tub ./data/all_training_data --record user/angle

Compare Datasets

Analyze different tubs separately to compare:

donkey tubhist --tub ./data/track1 --record user/angle --out track1_steering.png
donkey tubhist --tub ./data/track2 --record user/angle --out track2_steering.png

Validate Data Collection

Confirm you drove with full range of inputs:

donkey tubhist --tub ./data/latest_session

Debug Training Issues

If model performs poorly, check data distribution:

donkey tubhist --tub ./data/training_set --record user/angle
donkey tubhist --tub ./data/training_set --record user/throttle

Data Balancing Strategies

For Imbalanced Steering

Record reverse direction: Drive the track backward
Augmentation: Use horizontal flip in training config
Weighted sampling: Configure training to oversample underrepresented angles

# In myconfig.py
AUG_FLIP_HORIZONTAL = True  # Helps balance left/right

For Sparse Data Regions

Targeted collection: Record specific scenarios (sharp turns, etc.)
Multiple laps: Record more data of the same track
Data synthesis: Use augmentation techniques

For Too Much Straight Driving

Remove straight sections: Edit tubs to remove excess straight driving
Focus on turns: Record more laps focusing on technical sections
Use subset: Train only on data with abs(angle) > 0.1

Analysis Workflow

Collect initial data:
```
python manage.py drive
```
Check distribution:
```
donkey tubhist --tub ./data/tub_1
```
Identify issues:
- Note imbalances
- Check for missing ranges
- Verify full steering range used
Collect targeted data:
- Record specific scenarios that are underrepresented
- Drive track in reverse if left/right imbalanced

Verify improvement:

donkey tubhist --tub ./data/tub_1 ./data/tub_2

Proceed to training:

donkey train --tub ./data/tub_1 ./data/tub_2

Combining with Other Tools

Full Analysis Pipeline

# 1. Check data distribution
donkey tubhist --tub ./data/tub_1 --record user/angle

# 2. Train model
donkey train --tub ./data/tub_1 --model ./models/pilot.h5

# 3. Check prediction quality
donkey tubplot --tub ./data/validation --model ./models/pilot.h5

# 4. Create visualization video
donkey makemovie --tub ./data/validation --model ./models/pilot.h5

Troubleshooting

Tub not found

Verify tub path exists and is correct
Check that tub contains valid data (manifest.json)
Use absolute paths if relative paths fail

Empty or missing fields

Verify record name is correct (use user/angle, not just angle)
Check tub actually contains the specified field
Look at a tub’s manifest.json to see available fields

Plot display issues

On headless servers, may need to set matplotlib backend
Use SSH with X11 forwarding: ssh -X
Check that $DISPLAY environment variable is set

Image not saving

Verify output directory exists and is writable
Check disk space
Ensure path ends with .png

AttributeError or DataFrame errors

Update pandas: pip install --upgrade pandas
Update matplotlib: pip install --upgrade matplotlib
Verify tub format is compatible (v2 format)

Common Record Fields

Typical fields in Donkeycar tubs:

Field Name	Description	Typical Range
`user/angle`	Steering angle	-1.0 to 1.0
`user/throttle`	Throttle value	-1.0 to 1.0
`user/mode`	Operating mode	’user’, ‘local_angle’, ‘local’
`pilot/angle`	Autopilot steering	-1.0 to 1.0
`pilot/throttle`	Autopilot throttle	-1.0 to 1.0
`milliseconds`	Timestamp	Integer

To see all available fields in a tub, look at its manifest.json file.

Tips

Before Training

Always check histograms before training to avoid wasting time on bad data
Look at both steering and throttle distributions
Verify full range of values is present
Check for outliers that might indicate recording errors

Data Collection Strategy

Plan coverage: Aim for balanced distribution
Record multiple sessions: Combine data from different times/conditions
Monitor during collection: Check histograms after each session
Quality over quantity: 1000 good balanced samples beats 10000 imbalanced ones

Iterative Improvement

Baseline histogram: Record initial data distribution
Identify gaps: Note underrepresented values
Targeted collection: Focus on missing ranges
Verify improvement: Re-run histogram
Train and evaluate: See if balanced data improves model

Next Steps

After analyzing histograms:

Balance your data: Collect more data for underrepresented scenarios
Clean your data: Remove or trim problematic sections
Train your model: Use donkey train
Evaluate predictions: Use donkey tubplot
Visualize results: Use donkey makemovie

Commands

Documentation Index

​Usage

​Options

​What Gets Created

​Histogram Features

​Examples

​Analyze all fields in a tub

​Analyze specific field (steering)

​Analyze specific field (throttle)

​Save to custom location

​Analyze multiple tubs combined

​Analyze mode distribution

​Output Example

​Interpreting Histograms

​Steering (user/angle) Histogram

​Ideal Distribution

​Problem Patterns

​Throttle (user/throttle) Histogram

​Ideal Distribution

​Problem Patterns

​Use Cases

​Data Quality Check

​Identify Data Imbalance

​Compare Datasets

​Validate Data Collection

​Debug Training Issues

​Data Balancing Strategies

​For Imbalanced Steering

​For Sparse Data Regions

​For Too Much Straight Driving

​Analysis Workflow

​Combining with Other Tools

​Full Analysis Pipeline

​Troubleshooting

​Tub not found

​Empty or missing fields

​Plot display issues

​Image not saving

​AttributeError or DataFrame errors

​Common Record Fields

​Tips

​Before Training

​Data Collection Strategy

​Iterative Improvement

​Next Steps

Build docs developers (and LLMs) love

Usage

Options

What Gets Created

Histogram Features

Examples

Analyze all fields in a tub

Analyze specific field (steering)

Analyze specific field (throttle)

Save to custom location

Analyze multiple tubs combined

Analyze mode distribution

Output Example

Interpreting Histograms

Steering (user/angle) Histogram

Ideal Distribution

Problem Patterns

Throttle (user/throttle) Histogram

Ideal Distribution

Problem Patterns

Use Cases

Data Quality Check

Identify Data Imbalance

Compare Datasets

Validate Data Collection

Debug Training Issues

Data Balancing Strategies

For Imbalanced Steering

For Sparse Data Regions

For Too Much Straight Driving

Analysis Workflow

Combining with Other Tools

Full Analysis Pipeline

Troubleshooting

Tub not found

Empty or missing fields

Plot display issues

Image not saving

AttributeError or DataFrame errors

Common Record Fields

Tips

Before Training

Data Collection Strategy

Iterative Improvement

Next Steps