Overview
TheArbID class represents a CAN bus Arbitration ID and stores all associated data, analysis results, and metadata throughout the reverse engineering pipeline. Each instance encapsulates the raw data, binary representations, transmission frequency characteristics, and lexical tokenization for a single arbitration ID.
Constructor
The CAN Arbitration ID value (e.g., 0x123, 0x7E8)
Instance Attributes
Basic Identification
The CAN Arbitration ID value assigned during initialization
Data Length Code - the number of bytes in the CAN payload (0-8). Set by
PreProcessing.generate_arb_id_dictionary()Pandas DataFrame containing the time-indexed raw hexadecimal payload data. Columns represent byte positions (b0-b7), rows represent messages indexed by timestamp. Set by
PreProcessing.generate_arb_id_dictionary()Binary Matrix and Tang Representation
These attributes are populated by thegenerate_binary_matrix_and_tang() method.
Binary representation of payload data with shape
(num_messages, dlc*8). Each row represents one CAN message, with bits expanded across columns. Data type is uint8 with values 0 or 1Normalized transition activity for each bit position. Calculated using XOR between consecutive messages to detect bit transitions. Normalized using the provided strategy (typically min-max scaling). Shape is
(dlc*8,) with dtype float64Indicates if the Arb ID contains any dynamic data. Set to
False if any bit position shows transitions (max tang > 0)Transmission Frequency Analysis
These attributes are populated by theanalyze_transmission_frequency() method.
The z-score used for confidence interval calculation (e.g., 1.645 for 90% CI)
Mean transmission interval in milliseconds (or specified time units)
Standard deviation of transmission intervals in milliseconds
Confidence interval tuple
(lower_bound, upper_bound) for transmission frequency, assuming Gaussian normal distributionRatio of confidence interval range to mean frequency:
(2 * mean_offset) / freq_mean. Used as a heuristic to classify synchronous transmission patternsSet to
True if mean_to_ci_ratio <= synchronous_threshold, indicating the Arb ID transmits at a consistent, engineered frequencyLexical Analysis
These attributes are populated byLexicalAnalysis.get_composition().
List of tuples representing lexical tokens (bit position ranges) identified in the payload structure
List of bit positions identified as static padding bytes
Methods
generate_binary_matrix_and_tang
Timer object for performance profiling
Normalization function (e.g.,
sklearn.preprocessing.minmax_scale) applied to the tang array. Must accept parameters: (array, axis, copy)- Creates
boolean_matrixwith shape(num_messages, dlc*8)filled with zeros - Iterates through each message in
original_data - Converts each non-zero byte to an 8-bit binary string
- Populates the corresponding bit positions in the matrix
- Calculates transition matrix using XOR between consecutive rows
- Sums transitions per bit position to create the tang vector
- Normalizes tang using the provided strategy
- Sets
statictoFalseif any transitions detected
analyze_transmission_frequency
Conversion factor to apply to time intervals (e.g., 1000 converts seconds to milliseconds)
Z-score for confidence interval calculation:
- 1.28 for 80% CI
- 1.645 for 90% CI
- 1.96 for 95% CI
- 2.33 for 98% CI
- 2.58 for 99% CI
Maximum
mean_to_ci_ratio value to classify as synchronous. Values ≤ 0.1 indicate transmission frequency is consistent enough to be considered engineered/synchronous- Skips analysis if fewer than 4 data points exist
- Calculates transmission intervals from DataFrame index timestamps
- Computes mean and standard deviation of intervals
- Calculates confidence interval assuming Gaussian distribution
- Computes
mean_to_ci_ratioas a consistency heuristic - Sets
synchronousflag based on threshold comparison
mean_to_ci_ratio provides a scale-independent measure of transmission consistency. For example:
- An Arb ID with 1000ms mean frequency and 50ms CI range has ratio = 0.05 → likely synchronous
- An Arb ID with 40ms mean frequency and 50ms CI range has ratio = 1.25 → likely asynchronous/high-frequency
Usage Example
Pipeline Integration
The ArbID class is used throughout the CAN reverse engineering pipeline:-
Pre-Processing (
PreProcessor.generate_arb_id_dictionary()):- Creates ArbID instances for each unique arbitration ID
- Sets
dlcandoriginal_data - Calls
generate_binary_matrix_and_tang() - Calls
analyze_transmission_frequency()
-
Lexical Analysis (
LexicalAnalysis.tokenize_dictionary()):- Populates
tokenizationandpaddingattributes - Uses
tangvalues to identify signal boundaries
- Populates
-
Signal Generation (
LexicalAnalysis.generate_signals()):- Extracts Signal objects from identified tokens
- Uses
boolean_matrixto create time series data