The Pipeline_multi-file implementation extends the basic Pipeline to automatically process multiple CAN data samples in batch. This is essential for analyzing data from multiple vehicles, driving sessions, or conditions in research settings.
This is the most complete and robust implementation of the reverse engineering pipeline. It includes bug fixes that are NOT present in the basic Pipeline folder.
The FileBoi class expects a specific directory structure relative to the script location:
.+-- Captures| +-- Make x_0| | +-- Model y_0| | | +-- ModelYear z_0| | | | +-- Samples| | | | | +-- loggerProgram0.log| | | | | +-- loggerProgram1.log| +-- Make x_1| | +-- Model y_1...+-- Some folder| +-- Pipeline_multi-file| | +-- Main.py| | +-- FileBoi.py
The hierarchy can be simplified. You need at least one parent directory level above the Samples folder. FileBoi will adapt based on the number of directory levels present.
FileBoi handles automated discovery and organization of CAN data samples:
Pipeline_multi-file/FileBoi.py
class FileBoi: @staticmethod def go_fetch(kfold_n: int = 5): # Walks the directory tree looking for loggerProgramX.log files script_dir: str = getcwd() chdir("../../") if not path.exists("Captures"): print("Error finding Captures folder.") quit() chdir("Captures") root_dir = getcwd() sample_dict = {} for dirName, subdirList, fileList in walk(root_dir, topdown=True): for file in fileList: # Check if this file matches expected CAN data format m = re.match('loggerProgram[\d]+.log', file) if m: # Create Sample object with metadata this_sample = Sample(make=make, model=model, year=year, sample_index=this_sample_index, sample_path=dirName + "/" + m.group(0), kfold_n=kfold_n) sample_dict[(make, model, year)].append(this_sample) return sample_dict
Some bugs were fixed in Pipeline_multi-file but NOT backported to the basic Pipeline folder. Always use the multi-file version for production analysis.
According to the README:
“This folder includes the same classes from Pipeline. However, SOME BUGS WERE FIXED HERE but NOT in the classes saved in Pipeline.”
While specific bug fixes aren’t enumerated in the source, using the multi-file version ensures you have the most stable implementation.
These thresholds can be optimized per-vehicle using the Validator’s k-fold threshold selection (though this feature is marked as “NOT WORKING?” in the code).