Overview
The feature extractor computes statistical and entropy features from FlowRecords for machine learning analysis. It extracts timing features (IAT statistics, burstiness), flow features (throughput), size features (payload distributions), and entropy measures. Source:telemetry/feature_extractor.py
Feature Categories
Extracted features are organized into four categories:Timing Features
| Feature | Description | Use Case |
|---|---|---|
mean_iat | Mean inter-arrival time (seconds) | Detect regular beaconing |
std_iat | Standard deviation of IAT | Measure timing jitter |
min_iat | Minimum IAT | Detect burst patterns |
max_iat | Maximum IAT | Identify gaps |
burstiness | Coefficient of variation (std/mean) | Distinguish bursty vs regular traffic |
iat_autocorr | Lag-1 autocorrelation of IAT series | Detect periodic patterns |
Flow Features
| Feature | Description | Use Case |
|---|---|---|
flow_duration_s | Total flow duration (seconds) | Session length analysis |
total_bytes | Total bytes transferred | Volume analysis |
total_packets | Total packet count | Activity level |
bytes_per_second | Throughput (bytes/sec) | Bandwidth usage |
packets_per_second | Packet rate (packets/sec) | Activity intensity |
Size Features
| Feature | Description | Use Case |
|---|---|---|
payload_len_mean | Mean payload size (bytes) | Detect padding |
payload_len_std | Std dev of payload size | Size variance |
payload_len_min | Minimum payload size | Detect empty packets |
payload_len_max | Maximum payload size | MTU analysis |
Entropy Features
| Feature | Description | Use Case |
|---|---|---|
shannon_entropy | Shannon entropy of payload sizes | Detect encryption/randomization |
Core Functions
extract_features
Extract all features from a single FlowRecord:extract_all
Load a.flows file and extract features for all flows:
flows_file(str): Path to.flowsJSON Lines file
list[dict] - One feature dictionary per flow
save_features
Write features to both CSV and JSON formats:features(list[dict]): Feature dictionaries fromextract_all()output_file(str): Base output path (.csvsuffix optional)
Command-Line Usage
Run as a standalone module:--input(required): Input.flowsfile from flow_parser--output(required): Output CSV file (JSON also written automatically)
Feature Computation Details
Burstiness
Coefficient of variation of inter-arrival times:- Low values (< 0.5): Regular, periodic traffic (e.g., unmodified beacons)
- High values (> 1.0): Bursty, irregular traffic (e.g., human browsing)
IAT Autocorrelation
Lag-1 autocorrelation measures correlation between consecutive IATs:telemetry/feature_extractor.py:42-51):
- Positive values: Consecutive IATs are similar (periodic patterns)
- Near zero: IATs are independent (random)
- Negative values: Alternating fast/slow patterns
Shannon Entropy
Measures randomness of payload size distribution:- Low entropy (< 2.0): Uniform sizes (e.g., fixed-size packets)
- High entropy (> 5.0): Varied sizes (e.g., random padding)
Integration Example
Complete pipeline from PCAP to features:Analysis Examples
Compare Baseline vs Evasion Profiles
Filter by Flow Characteristics
Zero-Division Handling
Safe divisors prevent division by zero (telemetry/feature_extractor.py:12):
Performance
Processing Speed:- ~50,000 flows/second on typical hardware
- Feature extraction from 10K flow file ≈ 0.2 seconds
- Loads entire
.flowsfile into memory - Typical flow: ~200 bytes in memory
- 100K flows ≈ 20 MB RAM
Output File Organization
Logging
Feature extraction is logged:features extracted: Logged after processing (includes count)no features extracted: Warning if flows file is emptyfeatures saved: Logged after writing CSV/JSON (includes paths)
Troubleshooting
FileNotFoundError:- Ensure
.flowsfile exists (runflow_parserfirst) - Use absolute paths or run from project root
- Check if
.flowsfile contains valid JSON lines - Verify flows were successfully parsed from PCAP
- Should not occur due to safe divisors
- Report as bug if encountered
ML Integration
Features are ready for scikit-learn, TensorFlow, or PyTorch:Next Steps
- Run experiments: See Experiments for automated pipelines
- Visualize features: Use Jupyter notebooks to plot distributions
- Train models: Feed features into ML classifiers for C2 detection
See Also
- Traffic Capture - Capture network traffic
- Flow Analysis - Parse PCAPs into flows
- Experiments - Automated capture and analysis