Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alonsoir/test-zeromq-c-/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers common issues, diagnostic procedures, and recovery strategies for ML Defender components. Use the accordion sections below for FAQ-style troubleshooting.

Quick Diagnostics

Health Check Script

Run the built-in diagnostics:
# Full system diagnostics
cd /vagrant
bash scripts/debug.sh

# Network diagnostics
bash scripts/network_diagnostics.sh
Output includes:
  • File existence checks
  • Docker information
  • Running containers/processes
  • Network interfaces and routing
  • eBPF support
  • Recent logs

Component Status

# Check all components
pgrep -a firewall-acl-agent  # Should show PID and command
pgrep -a ml-detector         # Should show PID and command
pgrep -a sniffer             # Should show PID and command
pgrep -a etcd-server         # Should show PID and command

# Or use alias (Vagrant)
status-lab

Log Quick Check

# Check for errors in all logs
grep -i "error" /vagrant/logs/lab/*.log | tail -20

# Check for warnings
grep -i "warning" /vagrant/logs/lab/*.log | tail -20

# Check for crashes
grep -i "segfault\|abort\|fatal" /vagrant/logs/lab/*.log

Common Issues

Symptoms

  • Process exits immediately after launch
  • “Address already in use” errors
  • “Permission denied” errors

Diagnostics

# Check if port is already bound
ss -tlnp | grep -E "(5571|5572|2379)"

# Check for previous processes
pgrep -a sniffer
pgrep -a ml-detector
pgrep -a firewall-acl-agent

# Check permissions
ls -l /vagrant/sniffer/build/sniffer
ls -l /vagrant/ml-detector/build/ml-detector
ls -l /vagrant/firewall-acl-agent/build/firewall-acl-agent

# Check capabilities (for sniffer/firewall)
getcap /vagrant/sniffer/build/sniffer

Solutions

Port Already in Use:
# Kill existing processes
sudo pkill -9 sniffer
pkill -9 ml-detector
sudo pkill -9 firewall-acl-agent

# Or use alias
kill-lab

# Wait a few seconds
sleep 3

# Restart
run-lab
Permission Denied (Sniffer/Firewall):
# These components require root for eBPF/IPSet
sudo ./sniffer -c config/sniffer.json
sudo ./firewall-acl-agent -c config/firewall.json

# Or add capabilities (not recommended for development)
sudo setcap cap_net_raw,cap_net_admin,cap_bpf+eip ./sniffer
Binary Not Found:
# Rebuild component
cd /vagrant/sniffer
make clean && make

cd /vagrant/ml-detector/build
rm -rf * && cmake .. && make -j4

cd /vagrant/firewall-acl-agent/build
rm -rf * && cmake .. && make -j4
Config File Missing:
# Check config exists
ls -l /vagrant/sniffer/config/sniffer.json
ls -l /vagrant/ml-detector/config/ml_detector_config.json
ls -l /vagrant/firewall-acl-agent/config/firewall.json

# Validate JSON syntax
jq . /vagrant/firewall-acl-agent/config/firewall.json

Symptoms

  • Sniffer shows “Packets processed: 0”
  • Detector receives no data
  • No traffic in logs

Diagnostics

# Check interface is up
ip link show eth1
ip link show eth3

# Check promiscuous mode
ip link show eth1 | grep PROMISC
ip link show eth3 | grep PROMISC

# Test packet capture manually
sudo tcpdump -i eth1 -c 10

# Check eBPF program is loaded
sudo bpftool prog list | grep sniffer

# Check sniffer config
grep capture_interface /vagrant/sniffer/config/sniffer.json

Solutions

Interface Not in Promiscuous Mode:
# Enable promiscuous mode
sudo ip link set eth1 promisc on
sudo ip link set eth3 promisc on

# Verify
ip link show eth1 | grep PROMISC
Wrong Interface Configured:
# List available interfaces
ip -4 addr show | grep -E "^[0-9]+:|inet "

# Edit sniffer config
vim /vagrant/sniffer/config/sniffer.json
# Update "capture_interface": "eth1" (or correct interface)

# Restart sniffer
sudo pkill -9 sniffer
sudo ./sniffer -c config/sniffer.json
No Traffic on Interface:
# Generate test traffic
ping 8.8.8.8 -c 10

# Or from another terminal
curl -I https://example.com

# Check sniffer captures it
grep "Paquetes procesados" /vagrant/logs/lab/sniffer.log | tail -5
eBPF Program Not Loaded:
# Check kernel version (need 5.10+)
uname -r

# Check eBPF support
grep CONFIG_BPF /boot/config-$(uname -r)

# Rebuild eBPF program
cd /vagrant/sniffer
make clean && make

# Check for compilation errors
tail -50 /vagrant/logs/lab/sniffer.log

Symptoms

  • Sniffer captures packets but detector shows no input
  • Detector detects threats but firewall receives nothing
  • “Connection refused” or “timeout” errors

Diagnostics

# Check ZMQ ports
ss -tlnp | grep 5571  # Detector listening
ss -tlnp | grep 5572  # Firewall listening

# Check connections
ss -tnp | grep 5571 | grep ESTAB
ss -tnp | grep 5572 | grep ESTAB

# Check firewall rules (not iptables, network firewall)
sudo iptables -L -n | grep -E "(5571|5572)"

# Check logs for ZMQ errors
grep -i "zmq\|socket\|connect" /vagrant/logs/lab/*.log | tail -20

Solutions

Wrong Startup Order:
# Components must start in order:
# 1. Firewall (SUB - binds :5572)
# 2. Detector (PUB - connects to :5572, binds :5571)
# 3. Sniffer (PUSH - connects to :5571)

# Restart in correct order
kill-lab
sleep 3

# Start firewall first
cd /vagrant/firewall-acl-agent/build
sudo ./firewall-acl-agent -c ../config/firewall.json &
sleep 3

# Then detector
cd /vagrant/ml-detector/build
./ml-detector -c ../config/ml_detector_config.json &
sleep 2

# Finally sniffer
cd /vagrant/sniffer/build
sudo ./sniffer -c ../config/sniffer.json &
Port Already Bound:
# Find process using port
sudo lsof -i :5571
sudo lsof -i :5572

# Kill it
sudo kill -9 <PID>

# Restart components
run-lab
Endpoint Mismatch:
# Check detector config (should bind :5571)
grep zmq_endpoint /vagrant/ml-detector/config/ml_detector_config.json
# Should be: "tcp://127.0.0.1:5571" or "tcp://0.0.0.0:5571"

# Check sniffer config (should connect to :5571)
grep zmq_endpoint /vagrant/sniffer/config/sniffer.json
# Should be: "tcp://127.0.0.1:5571"

# Check firewall config (should bind :5572)
grep endpoint /vagrant/firewall-acl-agent/config/firewall.json
# Should be: "tcp://localhost:5572"

# Check detector config (should connect to :5572)
grep output_zmq /vagrant/ml-detector/config/ml_detector_config.json
# Should be: "tcp://0.0.0.0:5572"

Symptoms

  • “IPSet not found” errors
  • “IPSet add failed” errors
  • Capacity warnings
  • IPs not being blocked

Diagnostics

# Check IPSet exists
sudo ipset list -n | grep ml_defender

# Check IPSet details
sudo ipset list ml_defender_blacklist_test

# Check capacity
ENTRIES=$(sudo ipset list ml_defender_blacklist_test | grep -c "^[0-9]")
echo "Entries: $ENTRIES"

# Check iptables rule
sudo iptables -L ML_DEFENDER_TEST -n -v

Solutions

IPSet Doesn’t Exist:
# Create manually
sudo ipset create ml_defender_blacklist_test hash:ip \
  family inet hashsize 1024 maxelem 1000 timeout 3600

# Or let firewall create it (set create_if_missing: true)
vim /vagrant/firewall-acl-agent/config/firewall.json
# "create_if_missing": true

# Restart firewall
sudo pkill -9 firewall-acl-agent
cd /vagrant/firewall-acl-agent/build
sudo ./firewall-acl-agent -c ../config/firewall.json
IPSet Full (Capacity Limit):
# Check capacity
sudo ipset list ml_defender_blacklist_test | grep maxelem
# maxelem 1000 means max 1000 IPs

# Option 1: Increase capacity (requires recreate)
sudo ipset destroy ml_defender_blacklist_test
sudo ipset create ml_defender_blacklist_test hash:ip \
  family inet hashsize 4096 maxelem 10000 timeout 3600

# Option 2: Flush existing entries
sudo ipset flush ml_defender_blacklist_test

# Option 3: Update config (restart required)
vim /vagrant/firewall-acl-agent/config/firewall.json
# "max_elements": 10000,
# "hash_size": 4096
IPs Not Being Blocked:
# Check IP is in IPSet
sudo ipset test ml_defender_blacklist_test 192.168.1.100

# Check iptables rule exists
sudo iptables -L ML_DEFENDER_TEST -n -v | grep ml_defender_blacklist_test

# If rule missing, add it
sudo iptables -A ML_DEFENDER_TEST -m set --match-set ml_defender_blacklist_test src -j DROP

# Check rule is in INPUT chain
sudo iptables -L INPUT -n -v | grep ML_DEFENDER_TEST

# If not, insert it
sudo iptables -I INPUT -j ML_DEFENDER_TEST
Permission Denied:
# IPSet requires root
sudo ipset list

# Firewall must run as root
sudo ./firewall-acl-agent -c config/firewall.json

# Check sudoers file
sudo cat /etc/sudoers.d/ml-defender
# Should allow vagrant user to run ipset/iptables

Symptoms

  • “Decryption failed” errors
  • “Decompression failed” errors
  • “crypto_errors” > 0 in metrics
  • Firewall receives garbled data

Diagnostics

# Check crypto metrics
cat /vagrant/logs/lab/firewall-metrics.json | jq '.crypto, .compression'

# Check for crypto errors in logs
grep -i "decrypt\|encrypt\|crypto\|compression" /vagrant/logs/lab/*.log | grep -i error

# Check etcd connection
curl -s http://localhost:2379/version

# Check crypto tokens in etcd
etcdctl get /crypto/firewall/tokens --prefix

Solutions

etcd Not Running:
# Check etcd status
pgrep -a etcd-server

# Start etcd
cd /vagrant/etcd-server/build
./etcd_server &

# Or use Docker
docker-compose up -d etcd
Crypto Tokens Not Shared:
# Check ml-detector registered token
etcdctl get /crypto/detector/tokens --prefix

# Check firewall can read token
etcdctl get /crypto/firewall/tokens --prefix

# If missing, restart ml-detector (it publishes token)
pkill -9 ml-detector
cd /vagrant/ml-detector/build
./ml-detector -c ../config/ml_detector_config.json

# Wait for token publication (check logs)
grep "Published crypto token" /vagrant/logs/lab/detector.log
Crypto Disabled in Config:
# Check detector config
grep -A 5 '"encryption"' /vagrant/ml-detector/config/ml_detector_config.json
# "enabled": true

# Check firewall config
grep -A 5 '"encryption"' /vagrant/firewall-acl-agent/config/firewall.json
# "enabled": true

# If disabled, enable and restart
Key Mismatch:
# Delete all crypto tokens and restart
etcdctl del /crypto --prefix

# Restart detector (publishes new token)
pkill -9 ml-detector
./ml-detector -c config/ml_detector_config.json &

# Restart firewall (reads new token)
sudo pkill -9 firewall-acl-agent
sudo ./firewall-acl-agent -c config/firewall.json &

Symptoms

  • Component using >80% CPU
  • Memory growing continuously
  • System becomes unresponsive
  • Out of memory errors

Diagnostics

# Monitor CPU and memory
top -b -n 1 | grep -E "(sniffer|ml-detector|firewall)"

# Detailed process stats
ps aux | grep -E "(sniffer|ml-detector|firewall)" | \
  awk '{print $2, $3, $4, $5, $6/1024 "MB", $11}'

# Check for memory leaks
# Run for several hours, plot RSS over time
while true; do
  ps aux | grep ml-detector | awk '{print $6/1024}' >> mem.txt
  sleep 300
done

# Check queue depths
grep "queue_depth" /vagrant/logs/lab/*.log | tail -20

Solutions

High CPU - Sniffer:
# Reduce batch frequency
vim /vagrant/sniffer/config/sniffer.json
# Increase "batch_timeout_ms": 200 (from 100)
# Increase "batch_size": 20 (from 10)

# Disable unused feature groups
# "extract_traffic_features": false

# Reduce compression level
# "compression_level": 1 (fastest)
High CPU - Detector:
# Disable unused models
vim /vagrant/ml-detector/config/ml_detector_config.json
# Set "enabled": false for unused models

# Increase batch size (reduces inference calls)
# "batch_size": 200 (from 100)

# Increase thresholds (fewer detections)
# "ddos_threshold": 0.90 (from 0.85)
High CPU - Firewall:
# Enable batching
vim /vagrant/firewall-acl-agent/config/firewall.json
# "enable_batching": true
# "batch_size_threshold": 20
# "batch_time_threshold_ms": 2000
Memory Leak:
# Check for memory growth
grep -i "memory\|leak" /vagrant/logs/lab/*.log

# Restart component periodically (workaround)
# Add to cron:
0 */4 * * * /vagrant/scripts/restart_components.sh

# Report issue with memory profile
# Use valgrind (slow, for development only)
valgrind --leak-check=full --log-file=valgrind.log \
  ./ml-detector -c config/ml_detector_config.json
Out of Memory:
# Check available memory
free -h

# Kill memory-hungry processes
kill-lab

# Restart with memory limits
systemd-run --scope -p MemoryMax=512M sudo ./sniffer -c config/sniffer.json &
systemd-run --scope -p MemoryMax=1G ./ml-detector -c config/ml_detector_config.json &
systemd-run --scope -p MemoryMax=256M sudo ./firewall-acl-agent -c config/firewall.json &

Symptoms

  • Client VM cannot reach internet through defender
  • Packets not being forwarded
  • “Network unreachable” errors on client

Diagnostics

# On defender VM:

# Check IP forwarding
sysctl net.ipv4.ip_forward
# Should be: net.ipv4.ip_forward = 1

# Check routing
ip route show
# Should have default route via eth1

# Check NAT rules
sudo iptables -t nat -L POSTROUTING -n -v
# Should have MASQUERADE rule for eth1

# Check interfaces
ip addr show eth1
ip addr show eth3
# eth1: 192.168.56.20
# eth3: 192.168.100.1

# On client VM:

# Check default route
ip route show
# Should be: default via 192.168.100.1

# Test connectivity to gateway
ping -c 3 192.168.100.1

# Test internet (through gateway)
ping -c 3 8.8.8.8

Solutions

IP Forwarding Disabled:
# Enable IP forwarding
sudo sysctl -w net.ipv4.ip_forward=1
sudo sysctl -w net.ipv6.conf.all.forwarding=1

# Make permanent
echo "net.ipv4.ip_forward=1" | sudo tee -a /etc/sysctl.conf

# Verify
sysctl net.ipv4.ip_forward
NAT Not Configured:
# Add NAT rule
sudo iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE

# Add FORWARD rules
sudo iptables -A FORWARD -i eth3 -o eth1 -j ACCEPT
sudo iptables -A FORWARD -i eth1 -o eth3 -m state --state RELATED,ESTABLISHED -j ACCEPT

# Verify
sudo iptables -t nat -L POSTROUTING -n -v
rp_filter Blocking Traffic:
# Disable reverse path filtering (Qwen edge case)
sudo sysctl -w net.ipv4.conf.all.rp_filter=0
sudo sysctl -w net.ipv4.conf.eth1.rp_filter=0
sudo sysctl -w net.ipv4.conf.eth3.rp_filter=0

# Make permanent
sudo tee -a /etc/sysctl.conf <<EOF
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.eth1.rp_filter=0
net.ipv4.conf.eth3.rp_filter=0
EOF
Client Route Incorrect:
# On client VM:

# Delete existing default route
sudo ip route del default

# Add correct default route
sudo ip route add default via 192.168.100.1 dev eth1

# Verify
ip route show
ping 192.168.100.1
ping 8.8.8.8

Debug Scripts

debug.sh

Comprehensive system diagnostics:
source/scripts/debug.sh
#!/bin/bash
# Script de debug para troubleshooting del proyecto ZeroMQ + Protobuf

echo "🔍 ZeroMQ + Protobuf Debug Information"
echo "====================================="

# Verificar archivos necesarios
echo "📁 Checking required files..."
files_to_check=(
    "protobuf/network_security.proto"
    "docker-compose.yml"
    "service1/main.cpp"
    "service2/main.cpp"
)

for file in "${files_to_check[@]}"; do
    if [[ -f "$file" ]]; then
        echo "   ✅ $file"
    else
        echo "   ❌ $file (MISSING)"
    fi
done

echo ""
echo "🐳 Docker information..."
docker --version
docker-compose --version

echo ""
echo "🏃 Running containers:"
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

echo ""
echo "🔧 Recent Docker logs:"
docker-compose logs --tail=20 service1
docker-compose logs --tail=20 service2

# ... (see full script in source)

network_diagnostics.sh

Network-specific diagnostics:
source/scripts/network_diagnostics.sh
#!/bin/bash
set -e

echo "=== DIAGNÓSTICO DE RED - ZeroMQ Lab ==="
echo ""

echo "1. INTERFACES DE RED"
ip -4 addr show | grep -E "^[0-9]+:|inet "

echo ""
echo "2. TABLA DE RUTAS"
ip route

echo ""
echo "3. IPs CONFIGURADAS"
echo "  NAT (eth0):             $(ip -4 addr show eth0 | grep inet | awk '{print $2}' | cut -d'/' -f1)"
echo "  Private Network (eth1): $(ip -4 addr show eth1 | grep inet | awk '{print $2}' | cut -d'/' -f1)"

echo ""
echo "4. CONECTIVIDAD"
ping -c 1 -W 2 8.8.8.8 >/dev/null 2>&1 && echo "  Internet: ✓ OK" || echo "  Internet: ✗ FALLO"

echo ""
echo "5. KERNEL Y EBPF"
echo "  Kernel: $(uname -r)"
grep -q CONFIG_BPF=y /boot/config-$(uname -r) 2>/dev/null && echo "  eBPF: ✓ Soportado" || echo "  eBPF: ? Desconocido"

Component Failure Recovery

Automatic Restart

For production, use systemd with restart policies:
[Service]
Restart=on-failure
RestartSec=5s
StartLimitInterval=300
StartLimitBurst=5

Manual Recovery

# Stop all components
kill-lab

# Clean up any stale resources
sudo ipset flush ml_defender_blacklist_test
sudo iptables -F ML_DEFENDER_TEST

# Restart in correct order
run-lab

# Monitor for issues
logs-lab

State Recovery

# Export current IPSet before restart
sudo ipset save ml_defender_blacklist_test > ipset_backup.txt

# After restart, restore
sudo ipset restore < ipset_backup.txt

Getting Help

Log Collection

When reporting issues, collect logs:
# Collect all logs
cd /vagrant
tar -czf ml-defender-logs-$(date +%Y%m%d_%H%M%S).tar.gz logs/

# Include component versions
echo "Sniffer: $(/vagrant/sniffer/build/sniffer --version)" > versions.txt
echo "Detector: $(/vagrant/ml-detector/build/ml-detector --version)" >> versions.txt
echo "Firewall: $(/vagrant/firewall-acl-agent/build/firewall-acl-agent --version)" >> versions.txt

# Include system info
uname -a >> versions.txt
cat /etc/os-release >> versions.txt

Community Support

Debug Mode

Enable verbose logging:
// config/*.json
{
  "logging": {
    "level": "debug",
    "console": true
  },
  "debug": {
    "log_raw_protobuf": true,
    "log_zmq_connection_events": true,
    "log_crypto_operations": true
  }
}

Next Steps

Monitoring

Set up proactive monitoring

Performance Tuning

Optimize for better performance

Configuration

Review configuration options

Architecture

Understand component interactions

Build docs developers (and LLMs) love