Overview
The R scripts provide advanced time-series analysis capabilities using Empirical Dynamic Modeling (EDM) to identify causal relationships between CAN signals. This goes beyond simple correlation to detect directional causality.
EDM analysis is optional but powerful. Use it when you need to understand why signals correlate, not just that they correlate.
When to Use EDM vs Built-in Correlation
Method Use When Provides Built-in Correlation Initial exploration, clustering signals Symmetric correlation strength EDM / CCM Understanding causal direction, validating hypotheses Directional causality, nonlinearity detection
Installation and Setup
R Package Installation
The analysis requires the rEDM package developed by the Sugihara Lab at UC San Diego:
install.packages ( "rEDM" )
library (rEDM)
Set Working Directory
Point R to your data location:
setwd ( "C:/path/to/your/CAN/data" )
EDM requires time-series data in CSV format with named columns:
speed, brake, rpm
45.2, 0.0, 1850
45.5, 0.0, 1860
45.8, 12.3, 1840
44.1, 34.7, 1820
...
Ensure your CSV has:
Header row with signal names
No missing values (or handle them appropriately)
Consistent sampling rate (or interpolate)
Sufficient length (thousands of samples recommended)
EDM Analysis Workflow
The commands_list.txt provides a complete analysis pipeline. Here’s the step-by-step workflow:
Step 1: Load Data
speed_brake_rpm <- read.csv ( "speed_brake_rpm.csv" )
# Extract individual time series
ts_speed <- speed_brake_rpm $ speed
ts_rpm <- speed_brake_rpm $ rpm
ts_brake <- speed_brake_rpm $ brake
Step 2: Define Train-Test Split
Split data into library (training) and prediction (testing) sets:
# Use first 50% for training, second 50% for testing
lib_speed <- c ( 1 , 0.5 * length (ts_speed))
pred_speed <- c ( 0.5 * length (ts_speed) + 1 , length (ts_speed))
lib_rpm <- c ( 1 , 0.5 * length (ts_rpm))
pred_rpm <- c ( 0.5 * length (ts_rpm) + 1 , length (ts_rpm))
lib_brake <- c ( 1 , 0.5 * length (ts_brake))
pred_brake <- c ( 0.5 * length (ts_brake) + 1 , length (ts_brake))
Step 3: Simplex Projection (Find Optimal Embedding Dimension)
Determine the best embedding dimension E for each signal:
simplex_output_speed <- simplex(ts_speed, lib_speed, pred_speed, silent = TRUE )
simplex_output_rpm <- simplex(ts_rpm, lib_rpm, pred_rpm, silent = TRUE )
simplex_output_brake <- simplex(ts_brake, lib_brake, pred_brake, silent = TRUE )
# Save results
save (simplex_output_speed, file = "simplex_output_speed.Rda" )
save (simplex_output_rpm, file = "simplex_output_rpm.Rda" )
save (simplex_output_brake, file = "simplex_output_brake.Rda" )
Plot embedding results:
par ( mfrow = c ( 1 , 3 ), cex = 1.5 )
plot (simplex_output_speed $ E, simplex_output_speed $ rho, type = "l" ,
xlab = "Embedding Dimension (E)" ,
ylab = expression ( paste ( "Forecast Skill (" , rho, ")" )),
main = "Vehicle Speed" )
plot (simplex_output_rpm $ E, simplex_output_rpm $ rho, type = "l" ,
xlab = "Embedding Dimension (E)" , main = "Engine RPM" )
plot (simplex_output_brake $ E, simplex_output_brake $ rho, type = "l" ,
xlab = "Embedding Dimension (E)" , main = "Brake Pressure" )
The optimal E is where forecast skill (ρ) peaks. Typical values for vehicle CAN data are E=7-10.
Step 4: S-Map (Detect Nonlinearity)
S-maps distinguish between red noise and nonlinear deterministic behavior:
smap_output_speed <- s_map(ts_speed, lib_speed, pred_speed, E = 9 , silent = TRUE )
smap_output_rpm <- s_map(ts_rpm, lib_rpm, pred_rpm, E = 9 , silent = TRUE )
smap_output_brake <- s_map(ts_brake, lib_brake, pred_brake, E = 9 , silent = TRUE )
Interpret S-map results:
If forecast skill increases with θ (theta): nonlinear dynamics present
If flat across θ values: linear or random dynamics
Step 5: Convergent Cross Mapping (CCM)
CCM reveals causal direction between signals:
# Test if Speed causes RPM (and vice versa)
speed_xmap_rpm <- ccm(speed_brake_rpm,
lib_column = "speed" ,
target_column = "rpm" ,
E = 9 ,
lib_sizes = seq ( 0 , 15000 , by = 3000 ),
num_samples = 10 ,
random_libs = TRUE ,
replace = TRUE ,
num_neighbors = "E+1" ,
tp = 0 ,
silent = TRUE )
rpm_xmap_speed <- ccm(speed_brake_rpm,
lib_column = "rpm" ,
target_column = "speed" ,
E = 9 ,
lib_sizes = seq ( 0 , 15000 , by = 3000 ),
num_samples = 10 ,
random_libs = TRUE ,
replace = TRUE ,
num_neighbors = "E+1" ,
tp = 0 ,
silent = TRUE )
# Save results
save (speed_xmap_rpm, file = "speed_xmap_rpm.Rda" )
save (rpm_xmap_speed, file = "rpm_xmap_speed.Rda" )
Calculate mean cross-map skill:
speed_xmap_rpm_mean <- ccm_means(speed_xmap_rpm)
rpm_xmap_speed_mean <- ccm_means(rpm_xmap_speed)
Interpreting CCM Results
CCM reveals causal direction through convergence:
plot (speed_xmap_rpm_mean $ lib_size, pmax ( 0 , speed_xmap_rpm_mean $ rho),
type = "l" , col = "red" ,
xlab = "Library Size" ,
ylab = expression ( paste ( "Cross Map Skill (" , rho, ")" )),
ylim = c ( 0.86 , 0.95 ))
lines (rpm_xmap_speed_mean $ lib_size, pmax ( 0 , rpm_xmap_speed_mean $ rho),
col = "blue" )
legend ( x = "topleft" ,
legend = c ( "Speed xmap RPM" , "RPM xmap Speed" ),
col = c ( "red" , "blue" ), lwd = 1 )
Reading CCM Plots
X xmap Y means: “Can we predict Y using X’s attractor?”
If X xmap Y converges (skill increases with library size): Y causes X
If Y xmap X converges : X causes Y
If both converge : Bidirectional coupling
If neither converges : No causal relationship
Example Interpretation
For Speed and RPM:
Speed xmap RPM : Moderate convergence (ρ ≈ 0.92)
RPM xmap Speed : Strong convergence (ρ ≈ 0.94)
Conclusion : Bidirectional coupling with RPM having slightly stronger causal influence on Speed .
Real Example Analyses from commands_list.txt
The repository includes example analyses for three signal pairs:
1. Speed & RPM
speed_xmap_rpm <- ccm(speed_brake_rpm, lib_column = "speed" ,
target_column = "rpm" , E = 9 ,
lib_sizes = seq ( 0 , 15000 , by = 3000 ),
num_samples = 10 , random_libs = TRUE ,
replace = TRUE , silent = TRUE )
Expected Result : Strong bidirectional coupling (both converge above ρ=0.86)
2. Brake Pressure & RPM
brake_xmap_rpm <- ccm(speed_brake_rpm, lib_column = "brake" ,
target_column = "rpm" , E = 9 ,
lib_sizes = seq ( 0 , 15000 , by = 3000 ),
num_samples = 10 , random_libs = TRUE ,
replace = TRUE , silent = TRUE )
Expected Result : Weaker coupling (ρ ≈ 0.35-0.49), asymmetric causality
3. Speed & Brake Pressure
speed_xmap_brake <- ccm(speed_brake_rpm, lib_column = "speed" ,
target_column = "brake" , E = 9 ,
lib_sizes = seq ( 0 , 15000 , by = 3000 ),
num_samples = 10 , random_libs = TRUE ,
replace = TRUE , silent = TRUE )
Expected Result : Moderate coupling (ρ ≈ 0.6-0.9), driver behavior mediated
Understanding .Rda Output Files
.Rda files store R data frames for later reuse:
# Save current analysis
save (speed_xmap_rpm, file = "speed_xmap_rpm.Rda" )
# Load previous analysis
load ( file = "speed_xmap_rpm.Rda" )
Benefits:
Avoid re-running expensive CCM calculations
Share results without sharing raw data
Reproducible analysis workflow
Cross-Correlation Analysis
For simpler correlation without causality:
R/cross-correlation_speed_rpm.R
for (ccm_from in vars) {
for (ccm_to in vars[vars != ccm_from]) {
cf_temp <- ccf (speed_brake_rpm[, ccm_from],
speed_brake_rpm[, ccm_to],
type = "correlation" ,
lag.max = 500 ,
plot = FALSE ) $ acf
corr_matrix[ccm_from, ccm_to] <- max ( abs (cf_temp))
}
}
This finds maximum absolute correlation across all time lags up to 500 samples.
Visualizing Causal Relationships
Create publication-ready plots:
pdf ( file = "ccm_one_row.pdf" , width = 5.5 , height = 2.0 , family = "Helvetica" )
op <- par ( mfrow = c ( 1 , 3 ), cex.axis = 1 , cex.lab = 1 ,
oma = c ( 3 , 3 , 1 , 0 ) + 0.1 , mar = c ( 0 , 1 , 1 , 1 ) + 0.1 )
plot (speed_xmap_rpm_mean $ lib_size, pmax ( 0 , speed_xmap_rpm_mean $ rho),
type = "l" , col = "red" , main = "Speed & RPM" ,
xlim = c ( 5000 , 15000 ), ylim = c ( 0.86 , 0.95 ))
lines (rpm_xmap_speed_mean $ lib_size, pmax ( 0 , rpm_xmap_speed_mean $ rho),
col = "blue" )
legend ( x = "bottomright" , legend = c ( "S xmap R" , "R xmap S" ),
col = c ( "red" , "blue" ), lwd = 1 )
title ( xlab = "Library Size" , ylab = "Cross Map Skill (rho)" , outer = TRUE )
par (op)
dev.off ()
Best Practices
Sufficient data length : Use at least 5000+ samples for reliable CCM
Proper train-test split : 50-50 or 70-30 split recommended
Multiple library sizes : Test convergence with varying library sizes
Save intermediate results : EDM computations are expensive
Validate with domain knowledge : CCM results should match physical expectations
For vehicle CAN data, expect:
Strong : Speed ↔ RPM, Speed ↔ Wheel speeds
Moderate : Brake ↔ Speed, Throttle ↔ RPM
Weak : Brake ↔ Steering, RPM ↔ Temperature
Further Reading
Next Steps
Validation Quantify pipeline consistency with train-test validation
Multi-File Processing Process multiple CAN samples in batch