ArbID

Overview

The ArbID class represents a CAN bus Arbitration ID and stores all associated data, analysis results, and metadata throughout the reverse engineering pipeline. Each instance encapsulates the raw data, binary representations, transmission frequency characteristics, and lexical tokenization for a single arbitration ID.

Constructor

ArbID(arb_id: int)

Initializes a new ArbID instance with the specified arbitration ID value.

arb_id

int

required

The CAN Arbitration ID value (e.g., 0x123, 0x7E8)

Instance Attributes

Basic Identification

int

The CAN Arbitration ID value assigned during initialization

dlc

int

default:"0"

Data Length Code - the number of bytes in the CAN payload (0-8). Set by PreProcessing.generate_arb_id_dictionary()

original_data

DataFrame

default:"None"

Pandas DataFrame containing the time-indexed raw hexadecimal payload data. Columns represent byte positions (b0-b7), rows represent messages indexed by timestamp. Set by PreProcessing.generate_arb_id_dictionary()

Binary Matrix and Tang Representation

These attributes are populated by the generate_binary_matrix_and_tang() method.

boolean_matrix

ndarray

default:"None"

Binary representation of payload data with shape (num_messages, dlc*8). Each row represents one CAN message, with bits expanded across columns. Data type is uint8 with values 0 or 1

tang

ndarray

default:"None"

Normalized transition activity for each bit position. Calculated using XOR between consecutive messages to detect bit transitions. Normalized using the provided strategy (typically min-max scaling). Shape is (dlc*8,) with dtype float64

static

bool

default:"True"

Indicates if the Arb ID contains any dynamic data. Set to False if any bit position shows transitions (max tang > 0)

Transmission Frequency Analysis

These attributes are populated by the analyze_transmission_frequency() method.

ci_sensitivity

float

default:"0.0"

The z-score used for confidence interval calculation (e.g., 1.645 for 90% CI)

freq_mean

float

default:"0.0"

Mean transmission interval in milliseconds (or specified time units)

freq_std

float

default:"0.0"

Standard deviation of transmission intervals in milliseconds

freq_ci

tuple

default:"None"

Confidence interval tuple (lower_bound, upper_bound) for transmission frequency, assuming Gaussian normal distribution

mean_to_ci_ratio

float

default:"0.0"

Ratio of confidence interval range to mean frequency: (2 * mean_offset) / freq_mean. Used as a heuristic to classify synchronous transmission patterns

synchronous

bool

default:"False"

Set to True if mean_to_ci_ratio <= synchronous_threshold, indicating the Arb ID transmits at a consistent, engineered frequency

Lexical Analysis

These attributes are populated by LexicalAnalysis.get_composition().

tokenization

List[tuple]

default:"[]"

List of tuples representing lexical tokens (bit position ranges) identified in the payload structure

padding

List[int]

default:"[]"

List of bit positions identified as static padding bytes

Methods

generate_binary_matrix_and_tang

generate_binary_matrix_and_tang(
    a_timer: PipelineTimer,
    normalize_strategy: Callable
) -> None

Converts hexadecimal payload data to a binary matrix and calculates normalized transition activity (tang) for each bit position.

a_timer

PipelineTimer

required

Timer object for performance profiling

normalize_strategy

Callable

required

Normalization function (e.g., sklearn.preprocessing.minmax_scale) applied to the tang array. Must accept parameters: (array, axis, copy)

Process:

Creates boolean_matrix with shape (num_messages, dlc*8) filled with zeros
Iterates through each message in original_data
Converts each non-zero byte to an 8-bit binary string
Populates the corresponding bit positions in the matrix
Calculates transition matrix using XOR between consecutive rows
Sums transitions per bit position to create the tang vector
Normalizes tang using the provided strategy
Sets static to False if any transitions detected

Example Usage:

from sklearn.preprocessing import minmax_scale

arb_id = ArbID(0x123)
arb_id.dlc = 8
arb_id.original_data = df  # Pre-loaded DataFrame

arb_id.generate_binary_matrix_and_tang(timer, minmax_scale)
# Now arb_id.boolean_matrix and arb_id.tang are populated

analyze_transmission_frequency

analyze_transmission_frequency(
    time_convert: int = 1000,
    ci_accuracy: float = 1.645,
    synchronous_threshold: float = 0.1
) -> None

Analyzes transmission timing characteristics and classifies the Arb ID as synchronous or asynchronous.

time_convert

int

default:"1000"

Conversion factor to apply to time intervals (e.g., 1000 converts seconds to milliseconds)

ci_accuracy

float

default:"1.645"

Z-score for confidence interval calculation:

1.28 for 80% CI
1.645 for 90% CI
1.96 for 95% CI
2.33 for 98% CI
2.58 for 99% CI

synchronous_threshold

float

default:"0.1"

Maximum mean_to_ci_ratio value to classify as synchronous. Values ≤ 0.1 indicate transmission frequency is consistent enough to be considered engineered/synchronous

Process:

Skips analysis if fewer than 4 data points exist
Calculates transmission intervals from DataFrame index timestamps
Computes mean and standard deviation of intervals
Calculates confidence interval assuming Gaussian distribution
Computes mean_to_ci_ratio as a consistency heuristic
Sets synchronous flag based on threshold comparison

Synchronous Classification Logic: The mean_to_ci_ratio provides a scale-independent measure of transmission consistency. For example:

An Arb ID with 1000ms mean frequency and 50ms CI range has ratio = 0.05 → likely synchronous
An Arb ID with 40ms mean frequency and 50ms CI range has ratio = 1.25 → likely asynchronous/high-frequency

This assumes the OEM designed the bus properly without excessive arbitration losses. Example Usage:

arb_id.analyze_transmission_frequency(
    time_convert=1000,      # Convert to milliseconds
    ci_accuracy=1.645,      # 90% confidence
    synchronous_threshold=0.1
)

print(f"Mean frequency: {arb_id.freq_mean:.2f} ms")
print(f"Synchronous: {arb_id.synchronous}")

Usage Example

from Pipeline.ArbID import ArbID
from sklearn.preprocessing import minmax_scale
import pandas as pd

# Create ArbID instance
arb_id = ArbID(0x123)

# Set attributes (normally done by PreProcessing)
arb_id.dlc = 8
arb_id.original_data = pd.DataFrame(...)  # Time-indexed payload data

# Generate binary representation and transition analysis
arb_id.generate_binary_matrix_and_tang(timer, minmax_scale)

# Analyze transmission frequency
arb_id.analyze_transmission_frequency(
    time_convert=1000,
    ci_accuracy=1.645,
    synchronous_threshold=0.1
)

# Check results
if not arb_id.static:
    print(f"Arb ID 0x{arb_id.id:03X} contains dynamic data")
    print(f"Transmission: {arb_id.freq_mean:.2f}ms ± {arb_id.freq_std:.2f}ms")
    print(f"Synchronous: {arb_id.synchronous}")

Pipeline Integration

The ArbID class is used throughout the CAN reverse engineering pipeline:

Pre-Processing (PreProcessor.generate_arb_id_dictionary()):
- Creates ArbID instances for each unique arbitration ID
- Sets dlc and original_data
- Calls generate_binary_matrix_and_tang()
- Calls analyze_transmission_frequency()
Lexical Analysis (LexicalAnalysis.tokenize_dictionary()):
- Populates tokenization and padding attributes
- Uses tang values to identify signal boundaries
Signal Generation (LexicalAnalysis.generate_signals()):
- Extracts Signal objects from identified tokens
- Uses boolean_matrix to create time series data

Core Classes

Modules

Overview

Constructor

Instance Attributes

Basic Identification

Binary Matrix and Tang Representation

Transmission Frequency Analysis

Lexical Analysis

Methods

generate_binary_matrix_and_tang

analyze_transmission_frequency

Usage Example

Pipeline Integration

Build docs developers (and LLMs) love

Core Classes

Modules

​Overview

​Constructor

​Instance Attributes

​Basic Identification

​Binary Matrix and Tang Representation

​Transmission Frequency Analysis

​Lexical Analysis

​Methods

​generate_binary_matrix_and_tang

​analyze_transmission_frequency

​Usage Example

​Pipeline Integration

Build docs developers (and LLMs) love

Overview

Constructor

Instance Attributes

Basic Identification

Binary Matrix and Tang Representation

Transmission Frequency Analysis

Lexical Analysis

Methods

generate_binary_matrix_and_tang

analyze_transmission_frequency

Usage Example

Pipeline Integration