Ruflo System Architecture: Layers, Routing, and Intelligence

Ruflo is the execution layer that wraps Claude Code and Codex with everything a working agent needs: tools, memory, coordination loops, sandboxes, and security guardrails. Instead of one model working in isolation, Ruflo wires together a full system in which agents self-organize into swarms, learn from every completed task, and remember successful patterns across sessions. The overall data path follows a single closed loop:

User --> Ruflo (CLI/MCP) --> Router --> Swarm --> Agents --> Memory --> LLM Providers
                          ^                           |
                          +---- Learning Loop <-------+

Each completed task feeds back through the Learning Loop — successful patterns are distilled into memory, which improves routing decisions for the next task. You write code normally; Ruflo manages the rest.

Layer-by-Layer Breakdown

Entry Layer

The first stop for every request. Ruflo exposes two surfaces:

CLI — 26 commands and 140+ subcommands. Covers the full agent lifecycle, swarm management, memory operations, neural training, security scanning, and more.
MCP Server — 313 tools served over the Model Context Protocol. Registered once with claude mcp add ruflo -- npx ruflo@latest mcp start, then callable directly from Claude Code or any other MCP-compatible client (VS Code, Cursor, Windsurf, Claude Desktop).

AIDefence Security

Every inbound request passes through AIDefence before routing. This layer provides:

Request validation — Zod-based schema checks on all inputs
Prompt injection blocking — detects and neutralises injection attempts at the boundary
PII detection — 14-type pipeline strips sensitive data before it can propagate to agents or leave the node

Threat detection runs in under 10 ms and classifies requests as Safe → Allow, Warning → Sanitise, or Threat → Block.

Routing Layer

After security clearance, the routing layer decides what runs and where:

Component	Role
Q-Learning Router	Learns from task outcomes; epsilon-greedy exploration; 89% routing accuracy
MoE (Mixture of Experts)	8 specialised expert networks; dynamic gating selects the best expert per task type
Skills	137+ pre-built skills covering V3 core, swarm, GitHub, SPARC, FlowNexus, and dual-mode workflows
Hooks	27 lifecycle hooks fire automatically at task boundaries, session events, and tool calls

The routing layer also runs a Thompson sampling model router (alpha.5+): a cost-adjusted multi-armed bandit that self-corrects across three tiers (Haiku / Sonnet / Opus) using Beta(α, β) priors updated by hooks_model-outcome. After roughly 50 outcomes it stops over-using expensive tiers — no manual threshold tuning needed.

Swarm Coordination

Complex tasks are broken apart by the swarm coordinator and distributed across specialised agents:

Component	Description
Topologies	`hierarchical`, `mesh`, `ring`, `star` — chosen based on task complexity
Consensus	Raft (leader-elected, strongly consistent), Byzantine/BFT (tolerates up to ⅓ faulty agents), Gossip (eventually consistent, high-throughput)
Claims	Human-agent work ownership protocol with claim, release, and handoff semantics

The hierarchical topology with Raft consensus is the recommended default for coding tasks because a single coordinator validates every output against the original goal, catching drift early.

Agent Layer

The swarm spawns from a pool of 100+ typed, specialised agents. Each agent is optimised for a specific role: coder, tester, reviewer, architect, security, docs, devops, researcher, analyzer, coordinator, queen-coordinator, security-architect, memory-specialist, perf-analyzer, pr-manager, and many more across eight categories. Agents are managed by the AgentPool, which handles auto-scaling, idle timeouts, and health monitoring. Most users never spawn agents manually — the swarm coordinator does it automatically based on task type.

Resources

Three resource types back the agent layer:

Memory (AgentDB) — HNSW-indexed vector database; 150x–12,500x faster than brute-force search at scale. Entries persist across sessions and feed the Learning Loop.
LLM Providers — Anthropic (Claude), OpenAI (GPT), Google (Gemini), Cohere, and Ollama. Smart routing picks the cheapest provider that meets quality requirements; automatic failover if a provider is unavailable.
12 Background Workers — ultralearn, audit, optimize, consolidate, map, deepdive, document, refactor, benchmark, testgaps, predict, and preload. They trigger automatically on context signals (file changes, session events, memory thresholds) or can be dispatched manually.

RuVector Intelligence

The intelligence substrate that powers learning across the entire system:

Component	Purpose	Performance
SONA	Self-Optimizing Pattern Learning — learns optimal routing from trajectories	<0.05 ms adaptation
EWC++	Elastic Weight Consolidation — prevents catastrophic forgetting when learning new tasks	Zero knowledge loss
Flash Attention	Optimised attention computation via `@ruvector/attention`	2.49x–7.47x speedup
HNSW	Hierarchical Navigable Small World vector search	Sub-millisecond retrieval
ReasoningBank	Pattern storage with RETRIEVE → JUDGE → DISTILL → CONSOLIDATE → ROUTE cycle	BM25 + semantic hybrid search
Hyperbolic Embeddings	Poincaré ball model for hierarchical code relationships	Exponential embedding capacity
LoRA / MicroLoRA	Low-Rank Adaptation for efficient on-device fine-tuning	<5 MB memory footprint (Micro)
Int8 Quantisation	Converts 32-bit weights to 8-bit	~4× memory reduction
9 RL Algorithms	PPO, A2C, DQN, Q-Learning, SARSA, Decision Transformer, Curiosity, and more	Task-specific learning

V3 Architecture Decision Records

The V3 rewrite is governed by ten ADRs that codify every major design choice:

ADR	Decision
ADR-001	Adopt `agentic-flow` as the core foundation (eliminates 10,000+ duplicate lines)
ADR-002	Domain-Driven Design structure with bounded contexts
ADR-003	Single coordination engine — `UnifiedSwarmCoordinator`
ADR-004	Plugin-based architecture (microkernel pattern)
ADR-005	MCP-first API design across all modules
ADR-006	Unified memory service backed by AgentDB
ADR-007	Event sourcing for full audit trail on state changes
ADR-008	Vitest over Jest (10× faster test runs)
ADR-009	Hybrid memory backend (SQLite + AgentDB) as the default
ADR-010	Node.js 20+ only — Deno support removed

Performance Reference

Metric	Target	Achieved
Event Bus (100k events)	<50 ms	~6 ms
Map Lookup (100k gets)	<20 ms	~16 ms
Flash Attention speedup	2.49x–7.47x	Validated
AgentDB HNSW search	150x–12,500x faster	HNSW-indexed
SONA adaptation latency	<0.05 ms	~0.02 ms
Agent coordination (15 agents)	<100 ms	Validated

Explore the System

Agents

100+ typed agents, lifecycle states, spawning, and pool management.

Swarms

Topology types, consensus algorithms, and hive-mind coordination.

Memory

HNSW vector storage, semantic search, and cross-session persistence.

Hooks

27 lifecycle hooks and 12 background workers that power the learning loop.

Get Started

Core Concepts

Guides

Configuration

Ruflo System Architecture: Layers, Routing, and Intelligence

Layer-by-Layer Breakdown

Entry Layer

AIDefence Security

Routing Layer

Swarm Coordination

Agent Layer

Resources

RuVector Intelligence

V3 Architecture Decision Records

Performance Reference

Explore the System

Agents

Swarms

Memory

Hooks

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Documentation Index

​Layer-by-Layer Breakdown

​Entry Layer

​AIDefence Security

​Routing Layer

​Swarm Coordination

​Agent Layer

​Resources

​RuVector Intelligence

​V3 Architecture Decision Records

​Performance Reference

​Explore the System

Agents

Swarms

Memory

Hooks

Build docs developers (and LLMs) love

Layer-by-Layer Breakdown

Entry Layer

AIDefence Security

Routing Layer

Swarm Coordination

Agent Layer

Resources

RuVector Intelligence

V3 Architecture Decision Records

Performance Reference

Explore the System