Repository structure
Golden rule: “A place for everything, and everything in its place.” The directory structure is immutable without prior architectural discussion.
Architectural layers
Domain layer
Pure business logic with no external dependencies. Contains entities, use cases, and repository interfaces. Written in pure Dart (client) or pure Python (server) — no Flutter, FastAPI, or database imports allowed.
Data layer
Adapter implementations of the domain repository interfaces. Handles DTOs, data sources, ChromaDB queries, and LLM provider calls. This is the only layer that knows about external systems.
Presentation layer
Flutter widgets, screens, and Riverpod providers on the client. FastAPI routers on the server. Both delegate business decisions upward to the domain layer and never contain logic.
Infrastructure layer
Docker Compose orchestration, health checks, volume mounts, and network configuration. Defined in
infrastructure/docker-compose.yml and managed by the scripts in scripts/devops/.Backend architecture
The backend is a modular monolith built with FastAPI and LangChain, organized by domain.RAG pipeline
The Retrieval-Augmented Generation pipeline is the core of the backend:- User input is sanitized through
core/security.pybefore it reaches the LLM. - The query is embedded and matched against ChromaDB collections (
tech-packs,templates,examples). - Up to
RAG_MAX_CHUNKSrelevant chunks are retrieved and prepended to the prompt. - The assembled prompt (capped at
LLM_MAX_PROMPT_CHARS) is sent to the configured LLM provider. - The response streams token-by-token over a WebSocket connection to the Flutter client.
LLM providers
The active provider is selected byLLM_PROVIDER in .env. Three adapters are implemented:
| Provider | Value | Description |
|---|---|---|
| Google Gemini | gemini | Default. Cloud API, large context window. |
| Groq Cloud | groq | Ultra-fast cloud inference via llama-3.3-70b-versatile. |
| Ollama | ollama | 100% local inference. Recommended models: qwen2.5-coder:7b, llama3.2. |
Streaming configuration
WebSocket streaming behavior is controlled by four settings inconfig.py:
| Setting | Default | Description |
|---|---|---|
WS_HEARTBEAT_INTERVAL_SECONDS | 30.0 | Keepalive ping interval |
WS_IDLE_TIMEOUT_SECONDS | 300.0 | Max idle time before disconnect |
WS_TOKEN_DELAY_SECONDS | 0.05 | Delay between streamed tokens |
WS_BACKPRESSURE_THRESHOLD_BYTES | 102400 | Buffer threshold before throttling |
Frontend architecture
The Flutter desktop client follows Clean Architecture organized by feature (“Feature-First”).State management
The client uses Riverpod for state management. All providers live in thepresentation/ layer of each feature. Domain use cases are exposed as AsyncNotifier or StreamNotifier providers — never called directly from widgets.
Color opacity in all Flutter code uses
withValues(alpha: x.x) — the withOpacity() method is deprecated and not used anywhere in the codebase.Knowledge base structure
The RAG brain lives inpackages/knowledge_base/ and is organized for modular ingestion:
tech-packs, templates, and examples.
Infrastructure
Three Docker services are defined ininfrastructure/docker-compose.yml and communicate over a private bridge network (sa_network, subnet 172.25.0.0/16):
| Container | Image | Host port | Role |
|---|---|---|---|
sa_api | Custom build from src/server | 8000 | FastAPI backend |
sa_chromadb | chromadb/chroma | 8001 | Vector store |
sa_ollama | ollama/ollama | 11434 | Local LLM engine |
infrastructure/.
Architectural decision records
Key design decisions are documented as ADRs incontext/30-ARCHITECTURE/ADR/:
- ADR-002 — Configurable RAG limits (
LLM_MAX_PROMPT_CHARS,RAG_MAX_CHUNKS): explains the truncation-over-dropping safety net and hardware-agnostic tuning approach.