Overview
The flow parser analyzes PCAP files and extracts per-flow statistics organized by 5-tuple (src_ip, dst_ip, src_port, dst_port, protocol). It computes inter-arrival times, payload sizes, and beacon intervals across multiple TCP connections. Source:telemetry/flow_parser.py
FlowRecord Structure
Each network flow is represented as aFlowRecord dataclass:
Key Concepts
Inter-Arrival Times (IAT)
Time deltas between consecutive packets within a single flow:Beacon Inter-Arrival Times (Beacon IATs)
Time gaps between consecutive TCP connection starts to the same destination:telemetry/flow_parser.py:34-52):
- Group flows by
(dst_ip, dst_port) - Filter client-initiated flows:
src_port > 1024 and dst_port <= 1024 - Sort by
start_time - Compute gaps between consecutive connection starts
- Assign gap to the earlier flow
Core Functions
parse_pcap
Parse a PCAP file into FlowRecords:pcap_file(str): Path to PCAP file
list[FlowRecord] - One record per unique 5-tuple
Implementation Details:
- Uses Scapy’s
PcapReaderfor memory-efficient streaming - Supports TCP, UDP, ICMP, and raw IP packets
- Maps protocol numbers to names:
{1: 'ICMP', 6: 'TCP', 17: 'UDP'} - Non-TCP/UDP flows use ports
(0, 0) - Automatically calls
compute_beacon_iats()to populate beacon intervals
save_flows
Write FlowRecords to a JSON Lines file:flows(list[FlowRecord]): Flow records to saveoutput_file(str): Output path (parent directories created automatically)
compute_beacon_iats
Compute inter-flow beacon intervals (called automatically byparse_pcap):
- Group flows by
(dst_ip, dst_port) - Filter client-initiated:
src_port > 1024 and dst_port <= 1024 - Sort each group by
start_time - For each flow
i, computebeacon_iats[i] = [start_time[i+1] - start_time[i]]
Command-Line Usage
Run as a standalone module:--input(required): Input PCAP file path--output(required): Output .flows file path
Integration Example
Typical usage in analysis pipelines:File Format Examples
Input: PCAP File
Standard tcpdump/Wireshark format (.pcap or .pcapng)
Output: JSON Lines (.flows)
Flow Statistics
Access flow metrics programmatically:Protocol Mapping
Protocol numbers are mapped to names (telemetry/flow_parser.py:14):
"50" for ESP).
Performance Considerations
Memory Efficiency:- Uses Scapy’s
PcapReaderto stream packets (doesn’t load entire PCAP into RAM) - Suitable for analyzing multi-GB capture files
- ~10,000 packets/second on typical hardware
- 1 GB PCAP with 1M packets ≈ 100 seconds
- Stores all flows in memory (one FlowRecord per 5-tuple)
- Very large captures with millions of unique flows may require streaming processing
Logging
Flow parsing operations are logged:parsing pcap: Logged at start (includes filename)parse complete: Logged on success (includes flow count and packet count)beacon iats computed: Logged after beacon IAT calculation (includes group count)no flows detected: Warning if PCAP contains no IP trafficflows saved: Logged after writing .flows file (includes path and count)
Troubleshooting
No flows detected:- Ensure PCAP contains IP packets (not just Layer 2 frames)
- Check BPF filter used during capture
- Verify PCAP is not corrupted:
tcpdump -r capture.pcap -c 10
- Beacon IATs only computed for client-initiated flows (
src_port > 1024, dst_port <= 1024) - Single-connection captures won’t have beacon intervals
- Check that multiple connections to same destination were captured
- Ensure PCAP path is correct
- Use absolute paths or run from project root
Next Steps
After parsing flows:- Extract features: Use Feature Extractor to compute ML features
- Visualize flows: Load
.flowsfiles into Jupyter notebooks for plotting - Run experiments: See Experiments for automated analysis
See Also
- Traffic Capture - Capture PCAPs with tcpdump
- Feature Extraction - Compute ML features from flows
- Experiments - End-to-end capture and analysis pipelines