Overview
This guide will walk you through running the CAN reverse engineering pipeline on sample data. The pipeline processes CAN log files and automatically identifies signals, correlates time series data, and generates visualizations.Prerequisites
- Python 3.6+ installed
- Required packages installed (see Installation)
- Navigate to the
Pipeline/directory
Quick Start: Using Example Data
Run with Default Example Data
The simplest way to run the pipeline is with the default example file:This will process
loggerProgram0.log using default settings.The first run will take longer as it processes raw data. Subsequent runs use cached pickle files for faster execution.
Running with Your Own Data
Original Format
For CAN data in the original format (tab-separated values):CAN-Utils Format
For data captured with Linux can-utils (candump):The
--can-utils flag converts can-utils log format to the internal TSV format before processing.Understanding the Pipeline
The pipeline executes three main phases:1. Pre-Processing
- Imports CAN log file into Pandas DataFrame
- Groups messages by Arbitration ID
- Identifies J1979 (OBD-II) data
- Analyzes transmission frequencies
- Creates ArbID objects for each unique ID
2. Lexical Analysis
- Tokenizes binary payloads to detect signal boundaries
- Extracts individual time series signals
- Normalizes signal values
- Creates Signal objects for each detected time series
3. Semantic Analysis
- Computes correlation matrix between all signals
- Performs hierarchical clustering
- Labels signals by correlation with J1979 data
- Generates cluster visualizations and dendrograms
Configuration Options
You can customize the pipeline behavior by modifying variables inMain.py:
Output Control
Analysis Parameters
Normalization Strategy
Examining Output Files
Expected Output
When the pipeline completes successfully, you’ll see:Example: Processing CAN-Utils Data
Here’s a complete example of processing data from Linux can-utils:Troubleshooting
FileNotFoundError: loggerProgram0.log
FileNotFoundError: loggerProgram0.log
Make sure you’re in the If missing, provide your own CAN log file as an argument.
Pipeline/ directory and the example log file exists:ValueError: could not convert string to float
ValueError: could not convert string to float
This may indicate improperly formatted input data. Verify your log file format matches the expected structure:
- Tab-separated values
- Columns: time, id, dlc, b0, b1, b2, b3, b4, b5, b6, b7
- Hexadecimal values for ID and bytes
No output files generated
No output files generated
Check that
dump_to_pickle is set to True in Main.py (line 62):Pipeline runs but no signals detected
Pipeline runs but no signals detected
This can happen with:
- Short capture duration (not enough data)
- Static CAN traffic (no changing signals)
- Incorrect tokenization parameters
tokenization_bit_distance in Main.py (line 71).Next Steps
Advanced Usage
Process multiple CAN log files simultaneously
EDM Analysis
Perform causal analysis with Empirical Dynamic Modeling
Pipeline Details
Detailed pipeline stages and algorithms
API Reference
Complete API documentation for classes and modules
Getting Help
For questions and community support:- Join the Open Garages Google Group
- Review the dissertation for theoretical background
- Examine example output files included with the project