Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ritz541/flower-engine/llms.txt
Use this file to discover all available pages before exploring further.
Split-Brain Design
Flower Engine uses a split-architecture approach that separates concerns between AI orchestration and user interface:Why split? The architecture decouples fast UI rendering (Rust) from heavyweight AI operations (Python), ensuring the terminal interface stays responsive even during long LLM inference times.
Component Layers
The Brain (Python Backend)
The backend is built on FastAPI and handles all AI orchestration, data persistence, and narrative logic.Core Responsibilities
LLM Orchestration
LLM Orchestration
- Multi-provider support (OpenRouter, DeepSeek, Groq, Gemini)
- Token streaming with real-time performance metrics
- Dynamic model switching and pricing calculation
- Provider-specific client routing
Data Persistence
Data Persistence
- SQLite for sessions, characters, and worlds
- ChromaDB for vector storage (RAG)
- Session history with hot-swapping
- Character and world asset management
Context Management
Context Management
- RAG-based lore retrieval (top 2 chunks)
- Recent memory injection (top 3 chunks)
- Scene context on session start
- Chunked lore embedding (800 char chunks)
Startup Sequence
Fromengine/main.py:26-149, the backend performs initialization:
The Face (Rust Frontend)
The TUI is built with Ratatui and Tokio, providing a blazingly fast, async terminal interface.Event Loop Architecture
Fromtui/src/main.rs:54-286, the main loop uses tokio::select! for concurrent event handling:
150ms tick rate ensures smooth spinner animations and cursor blinking without consuming excessive CPU.
Connection Management
Fromtui/src/ws.rs:8-69, the WebSocket client implements auto-reconnect:
Data Flow
User Message Flow
Context Building
- RAG queries lore (top 2 chunks)
- RAG queries memory (top 3 chunks)
- Scene added if first message
LLM Streaming
- System prompt + history + context sent to LLM
- Tokens streamed back as
chat_chunkevents
Cancellation Flow
Fromengine/main.py:233-250, streaming can be interrupted:
System Requirements
Memory
4GB+ RAM requiredEmbeddings run on CPU using
all-MiniLM-L6-v2 for maximum compatibilityStorage
~1GB disk spaceSetup optimized to avoid heavy CUDA libraries
Runtime
Python 3.12+Rust (stable)Latest versions recommended
Platform
Linux, macOSWindows via WSL2Native terminal support required
Performance Characteristics
Latency Breakdown
| Operation | Typical Time | Notes |
|---|---|---|
| WebSocket round-trip | Less than 5ms | Localhost connection |
| RAG query (lore) | 50-150ms | CPU embedding, 2 results |
| RAG query (memory) | 30-100ms | CPU embedding, 3 results |
| LLM first token | 200ms-2s | Provider-dependent |
| Token streaming | 20-100 tokens/sec | Model-dependent |
| UI render frame | Less than 1ms | Ratatui efficiency |
The Rust TUI maintains 60+ FPS during active streaming, ensuring smooth scrolling and animations.
Asset Structure
The engine loads configuration from YAML files at startup:World YAML Schema
Lore chunking: Long lore text is automatically split into 800-character chunks with smart line-break handling, then embedded separately for RAG retrieval.
Next Steps
Split-Brain Deep Dive
Learn why Python and Rust work better apart
WebSocket Protocol
Master the JSON message format
System Rules
Understand hardcore narrative constraints
Quick Start
Set up your own instance