Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ruvnet/ruflo/llms.txt
Use this file to discover all available pages before exploring further.
ruflo neural exposes Ruflo’s RuVector intelligence layer — the collection of algorithms that make the system self-improving. Training stores successful patterns in the ReasoningBank so the SONA router can recall and reuse them on future tasks. The command also surfaces diagnostics for the HNSW vector index, Flash Attention engine, MicroLoRA adapter, and the Thompson sampling model router.
Core concepts
| Concept | What it is |
|---|---|
| SONA (Self-Optimizing Neural Architecture) | Real-time pattern matching and routing engine. Target: <0.05 ms per routing decision. Benchmark is measured during neural status. |
| ReasoningBank | Persistent store of successful reasoning trajectories. Entries are written by hooks on task completion and recalled by neural patterns. |
| EWC++ (Elastic Weight Consolidation++) | Prevents catastrophic forgetting: old patterns retain their confidence even as new ones are added. |
| MicroLoRA / LoRA | Low-rank adaptation of the routing weights. Updated incrementally during neural train without a full model retrain. |
| Flash Attention | Optimized attention computation. 2.49×–7.47× speedup on attention-heavy sequences (benchmarked). |
| MoE (Mixture of Experts) | 8-expert router that activates the top-2 specialists for each input token, reducing unnecessary computation. |
| Thompson sampling | Cost-adjusted multi-armed bandit used to route tasks to the cheapest model tier (Haiku / Sonnet / Opus) that can handle them. Beta(α,β) priors are updated by hooks model-outcome. |
| 9 RL algorithms | Q-Learning, SARSA, A2C, PPO, DQN, Decision Transformer, and three others — available for task-specific fine-tuning. |
Synopsis
Subcommands
| Subcommand | Description |
|---|---|
train | Train patterns using WASM SIMD acceleration (MicroLoRA + Flash Attention) |
status | Full system status: SONA, RuVector, HNSW, embedding model, LoRA adapter |
patterns | Query, list, export, or import the pattern store |
predict | Run a prediction through the current pattern model |
optimize | Trigger a manual optimization pass on the routing weights |
benchmark | Run a latency and throughput benchmark of the neural stack |
list | List available models and adapters |
export | Export the trained pattern set to JSON |
import | Import a pattern set from JSON |
router | Inspect and configure the MoE / Thompson sampling router |
distill | weight-eft training-data slice: export, plan, eval, and (spend-gated) remote train |
train
Train the neural routing weights on a named pattern type. Generates embeddings for the training data, runs the MicroLoRA update loop with optional Flash Attention and contrastive (InfoNCE) loss, and persists trajectories to the ReasoningBank.Pattern type to train. Valid values:
coordination, optimization, prediction, security, testing, debugging, memory, reasoning.Number of training epochs.
Path to a JSON training-data file (array of
{content, type} objects). When omitted, Ruflo generates synthetic samples for the selected pattern type.Gradient step size for MicroLoRA updates.
Number of embeddings processed per gradient step.
Embedding dimension (capped at 256).
Use the RuVector WASM backend. Falls back to a JS implementation if
@ruvector/learning-wasm is not installed.Enable Flash Attention for 2.49×–7.47× speedup on attention computations.
Enable Mixture of Experts routing during training.
Use Poincaré ball (hyperbolic) attention for hierarchical pattern structures.
Train with InfoNCE contrastive loss (anchor + positives vs. negatives).
Enable curriculum learning: start with easy samples and scale difficulty by epoch.
Training backend:
auto (native @ruvector/ruvllm when available, else WASM), native (requires @ruvector/ruvllm), or wasm.Fraction of data held out for validation (0–1). Enables early stopping and
Best Val Loss reporting. Native backend only.Path to a native-backend checkpoint file to resume training from. Requires
--backend native (or auto resolving to native). Restores epoch position on @ruvector/ruvllm ≥ 2.6.0.--resume is a native-backend-only feature. Passing it alongside --backend wasm is an explicit error — the command exits immediately with a descriptive message.status
Measure and display the live state of every component in the neural stack. Runs a 100-sample benchmark of the SONA adaptation time to give a real latency number rather than a static claim.Show extended metrics: trajectory counts, LoRA delta norms, SONA per-operation p99, and ruvllm coordinator stats.
| Component | What is reported |
|---|---|
| SONA Coordinator | Active/inactive; avg adaptation time in µs |
| RuVector Training | Backend (WASM or JS fallback); total MicroLoRA adaptations |
| SONA Engine | Total learns, total searches |
| ReasoningBank | Entry count, patterns stored |
| HNSW Index | Initialized / available / not installed; vector count and dimensions |
| Embedding Model | Provider name and dimensions; semantic vs hash-fallback flag |
| Flash Attention Ops | Available operations: batchCosineSim, softmax, topK |
| Int8 Quantization | ~4× memory reduction |
| ruvllm Coordinator | Trajectory count |
| Contrastive Trainer | Triplet count, agent count |
| Training Pipeline | Backend version; latest checkpoint path and age |
| Graph Database | Node and edge counts |
patterns
Query or manage the pattern store (ReasoningBank entries).Operation to perform:
list, analyze, learn, or predict.Search query for pattern retrieval (used with
analyze action).Maximum number of patterns to return.
predict
Run input text through the trained pattern model and return a routing prediction with confidence scores.Input text to predict routing for.
Number of top predictions to return.
benchmark
Run a latency and throughput benchmark of the entire neural stack and produce a table of operations-per-second for each component.Number of benchmark iterations per component.
Embedding dimension to benchmark (capped at 256).
Number of attention keys for attention benchmarks.
distill
Thedistill subgroup manages the weight-eft training-data pipeline (ADR-150). All operations are $0 except distill train --execute --yes, which triggers real remote-GPU compute on a user-provided host.
| Subcommand | Description |
|---|---|
distill export | Convert captured session transcripts to audited SFT/DPO JSONL + a guard report |
distill plan | Print the GPU training plan and ruvllm commands (offline dry-run, $0) |
distill eval | Compute the cost-Pareto delta between a base-model run and an adapter run |
distill train | Remote-GPU LoRA tune via SSH (dry-run by default; --execute --yes to spend) |
Examples
The SONA <0.05 ms target is measured on the adaptation benchmark (
benchmarkAdaptation(100)). If neural status reports the target is not met (Target Met: No), try reducing CLAUDE_FLOW_HNSW_EF or lowering the embedding dimension with --dim 128 on the next training run.