Skip to main content

Overview

The RAG System (Retrieval-Augmented Generation) provides natural language intelligence over ML Defender’s security events. Using TinyLlama for language understanding and FAISS for vector search, it enables forensic queries like β€œShow me all ransomware detections from 10.0.0.50 last week” without SQL or log parsing.

Components

  • RAG Ingester: Log parsing + vector embeddings
  • RAG Server: TinyLlama + FAISS query engine
  • 4 FAISS Indices: Temporal, semantic, benign, malicious
  • etcd Integration: Service discovery

Capabilities

  • Natural Language Queries: Ask questions in English
  • Temporal Analysis: β€œLast week”, β€œYesterday morning”
  • Pattern Recognition: β€œSimilar to this attack”
  • ML Retraining Data: Export feature vectors

Architecture

The RAG System consists of two symbiotic services that work together:

RAG Ingester

Multi-Index Strategy

The Ingester maintains 4 specialized FAISS indices for different query patterns:
Dimensions: 128Purpose: Time-series queriesOptimized For:
  • β€œShow me attacks from last week”
  • β€œWhat happened on Monday between 2-4 PM?”
  • β€œHourly attack trends”
Embedding Model: Chronos temporal encoder (ONNX)PCA: 512d β†’ 128d reduction for efficient storage

Eventual Consistency

The Ingester uses best-effort commits for high availability:
// MultiIndexManager commits indices independently
void MultiIndexManager::commit_all() {
    std::vector<std::future<bool>> futures;
    
    // Parallel commits (non-blocking)
    futures.push_back(std::async(std::launch::async, 
        [this] { return chronos_index_->commit(); }));
    futures.push_back(std::async(std::launch::async, 
        [this] { return sbert_index_->commit(); }));
    futures.push_back(std::async(std::launch::async, 
        [this] { return benign_index_->commit(); }));
    futures.push_back(std::async(std::launch::async, 
        [this] { return malicious_index_->commit(); }));
    
    // Track failures but don't block
    int successes = 0;
    for (auto& future : futures) {
        if (future.get()) successes++;
    }
    
    // Availability > Consistency: Better 3/4 than 0/4
    if (successes >= 2) {
        logger_->info("Commit successful: {}/4 indices", successes);
    } else {
        logger_->warn("Partial commit: {}/4 indices", successes);
    }
}
Design Philosophy: Availability over Consistency. Better to have 3/4 indices working than to block and have 0/4.

Configuration

{
  "service": {
    "id": "rag-ingester-default",
    "version": "0.1.0",
    "etcd": {
      "endpoints": ["127.0.0.1:2379"],
      "heartbeat_interval_sec": 10,
      "partner_detector": "ml-detector-default"
    }
  },
  
  "ingester": {
    "input": {
      "source": "file_watch",
      "directory": "/vagrant/logs/rag/synthetic/artifacts",
      "pattern": "*.pb.enc",
      "encrypted": true,
      "compressed": true,
      "delete_after_process": false
    },
    
    "threading": {
      "mode": "single",
      "embedding_workers": 1,
      "indexing_workers": 1
    },
    
    "embedders": {
      "chronos": {
        "enabled": true,
        "onnx_path": "/vagrant/rag-ingester/models/onnx/chronos.onnx",
        "input_dim": 83,
        "output_dim": 512
      },
      "sbert": {
        "enabled": true,
        "onnx_path": "/vagrant/rag-ingester/models/onnx/sbert.onnx",
        "input_dim": 83,
        "output_dim": 384
      },
      "attack": {
        "enabled": true,
        "onnx_path": "/vagrant/rag-ingester/models/onnx/attack.onnx",
        "input_dim": 83,
        "output_dim": 256,
        "benign_sample_rate": 0.1
      }
    },
    
    "pca": {
      "enabled": true,
      "chronos_model": "/vagrant/rag-ingester/models/pca/chronos_512_128.faiss",
      "sbert_model": "/vagrant/rag-ingester/models/pca/sbert_384_96.faiss",
      "attack_model": "/vagrant/rag-ingester/models/pca/attack_256_64.faiss"
    },
    
    "faiss": {
      "index_type": "Flat",
      "metric": "L2",
      "persist_path": "/shared/faiss_indexes",
      "checkpoint_interval_events": 1000
    },
    
    "health": {
      "cv_warning_threshold": 0.20,
      "cv_critical_threshold": 0.15,
      "report_to_etcd": true
    }
  }
}

Threading Modes

{
  "threading": {
    "mode": "single",
    "embedding_workers": 1,
    "indexing_workers": 1
  }
}
Memory: ~310MBUse Case: Resource-constrained environments

RAG Server (TinyLlama)

Natural Language Query Processing

The RAG Server uses TinyLlama (1.1B parameters) for query understanding:
1

Query Understanding

User Query: β€œShow me all ransomware detections from 10.0.0.50 last week”TinyLlama Extracts:
{
  "intent": "search",
  "attack_type": "ransomware",
  "source_ip": "10.0.0.50",
  "time_range": {
    "start": "2025-11-01T00:00:00Z",
    "end": "2025-11-08T00:00:00Z"
  },
  "index_strategy": ["entity_malicious", "chronos"]
}
2

Vector Search

FAISS Queries (parallel):
  • Entity Malicious Index: Find all events from 10.0.0.50
  • Chronos Index: Filter by time range (last week)
Results: 47 matching events
3

Result Aggregation

TinyLlama Summarizes:
Found 47 ransomware detections from 10.0.0.50 last week:

- Nov 1, 14:23: Initial C&C callback (15 external IPs contacted)
- Nov 1, 14:25: SMB lateral movement (8 hosts infected)
- Nov 1, 14:30: Encryption started (payload entropy 7.9)
- Nov 2-7: Daily C&C check-ins (total 39 events)

Blocked: Yes (added to IPSet on Nov 1, 14:23)
Recidivism: 39 attempts after block

Example Queries

"What attacks happened yesterday?"
"Show me DDoS events from last Monday"
"Hourly attack trends for the past week"
"Traffic patterns between 2-4 AM"

ML Retraining Data Export

The RAG System can export feature vectors for ML model retraining:
# Query via RAG API
query = """
Export all ransomware detections from the past 30 days
with ground truth labels (blocked = positive, 
                          false_positive = negative)
"""

response = rag_client.query(query)

# Returns Parquet file with:
# - 83 features per event
# - Ground truth labels
# - Metadata (timestamp, IP, attack_type)
df = pd.read_parquet(response.export_path)

print(df.shape)  # (12847, 86)
# 83 features + ground_truth + timestamp + source_ip
Use Cases:
  • Model drift detection: Compare new data distribution vs training data
  • Incremental training: Retrain RandomForest on recent attacks
  • False positive analysis: Identify mislabeled events

Deployment

Prerequisites

sudo apt-get install -y \
    build-essential cmake \
    libzmq3-dev libprotobuf-dev \
    liblz4-dev nlohmann-json3-dev libspdlog-dev

# FAISS (compile from source)
git clone https://github.com/facebookresearch/faiss.git
cd faiss
cmake -B build -DFAISS_ENABLE_GPU=OFF .
make -C build -j$(nproc)
sudo make -C build install

# ONNX Runtime
wget https://github.com/microsoft/onnxruntime/releases/download/v1.16.0/onnxruntime-linux-x64-1.16.0.tgz
tar -xzf onnxruntime-linux-x64-1.16.0.tgz
sudo cp -r onnxruntime-linux-x64-1.16.0/lib/* /usr/local/lib/

Build RAG Ingester

1

Navigate

cd /vagrant/rag-ingester
mkdir -p build && cd build
2

Configure

cmake .. -DCMAKE_BUILD_TYPE=Release
3

Compile

make -j$(nproc)

Run RAG Ingester

./rag-ingester /vagrant/rag-ingester/config/rag-ingester.json
Real-time Output:
[RAG-INGESTER] πŸš€ Starting RAG Ingester v0.1.0
[RAG-INGESTER] πŸ“ Watching directory: /vagrant/logs/rag/synthetic/artifacts
[RAG-INGESTER] 🧠 Loaded 3 ONNX models:
  - Chronos: 83 β†’ 512 dimensions
  - SBERT: 83 β†’ 384 dimensions
  - Attack: 83 β†’ 256 dimensions
[RAG-INGESTER] πŸ“Š PCA enabled: 512β†’128, 384β†’96, 256β†’64
[RAG-INGESTER] πŸ“‚ Created 4 FAISS indices (Flat, L2 metric)
[RAG-INGESTER] βœ… Ready for ingestion

[INGESTION] File: ml_detector_2025-11-01_14-23-15.pb.enc
[DECRYPT] ChaCha20-Poly1305 decryption: OK
[DECOMPRESS] LZ4 decompression: OK (1024 bytes)
[PARSE] Protobuf parsed: 47 events
[EMBED] Chronos: 47 vectors (512d)
[EMBED] SBERT: 47 vectors (384d)
[EMBED] Attack: 47 vectors (256d)
[PCA] Dimensionality reduction: 512β†’128, 384β†’ 96, 256β†’64
[INDEX] Added to 4 FAISS indices
[COMMIT] Checkpoint: 1000 events indexed

Run RAG Server

cd /vagrant/rag
python rag_server.py --config config/rag_config.json

Troubleshooting

# Verify index files exist
ls -lh /shared/faiss_indexes/

# Should see:
# chronos_index.faiss
# sbert_index.faiss
# benign_index.faiss
# malicious_index.faiss

# Rebuild indices if missing
./rag-ingester --rebuild-indices
# Verify ONNX Runtime installation
ldconfig -p | grep onnxruntime

# Check model files exist
ls -lh /vagrant/rag-ingester/models/onnx/*.onnx

# Validate ONNX models
python -c "import onnx; onnx.checker.check_model('chronos.onnx')"
Symptom: OOM error during query processingSolution: Use 4-bit quantization:
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    load_in_4bit=True,
    device_map="auto"
)
Memory: 2GB β†’ 600MB
# Check inotify limits
cat /proc/sys/fs/inotify/max_user_watches

# Increase if needed
echo 524288 | sudo tee /proc/sys/fs/inotify/max_user_watches

# Make persistent
echo "fs.inotify.max_user_watches=524288" | \
  sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Roadmap

Priority 1.1: Firewall Log Parsing

Goal: Ingest firewall-agent logs for ground truth linking
1

Detection ↔ Block Linking

Link ML Detector events to Firewall Agent blocks:
[ML-DETECTOR] 10.0.0.50 β†’ Ransomware (14:23:15)
      ↓ (5ms latency)
[FIREWALL] 10.0.0.50 added to IPSet (14:23:15)
2

Cross-component Queries

β€œShow me all detections that were NOT blockedβ€β€œWhat’s the latency between detection and blocking?”

Priority 1.2: Temporal Queries

Goal: Natural language time expressions
"Yesterday morning" β†’ 2025-11-07 06:00-12:00
"Last Monday" β†’ 2025-11-03 00:00-23:59
"Past 3 hours" β†’ now - 3h to now

Priority 1.3: Aggregation & Statistics

Goal: Summary queries
"Top 10 malicious IPs this month"
"Hourly attack distribution"
"Recidivism rate (% of blocked IPs that retry)"

Next Steps

Sniffer

Configure network packet capture

ML Detector

Set up ML inference pipeline

Firewall Agent

Deploy autonomous blocking

Model Training

Retrain models with RAG-exported data

Build docs developers (and LLMs) love