Introduction

SoftArchitect AI is not just another code chat. It is your shadow software architect. It is an assisted work environment that guides the most critical and often overlooked phase of development: project conception. By leveraging RAG (Retrieval-Augmented Generation) over a curated knowledge base, it ensures your application design complies with SOLID principles, Clean Architecture, and OWASP security guidelines before you write the first line of code. Its mission is clear: transform abstract ideas into development-ready technical specifications, cutting technical debt at the root.

The guided architectural workflow

Every session follows a structured five-stage process that acts as a preventive quality gate:

Stage	Purpose
Context	Capture project background, team constraints, and existing systems
Requirements	Define functional and non-functional requirements with the AI
Architecture	Select patterns, tech stacks, and validate against best practices
UX/UI	Design user flows and interface decisions grounded in architecture
Planning	Break the design into actionable development tasks and stories

Each stage builds on the last, so decisions made early — about domain boundaries, data ownership, or security posture — propagate cleanly into the final technical specification.

Why local-first matters

SoftArchitect AI runs entirely on your machine. Your architecture decisions, business requirements, and source context never leave your network unless you explicitly choose a cloud provider. Two modes are available:

Privacy mode — Runs inference via Ollama on your local hardware. Zero external API calls.
Performance mode — Connects to Groq Cloud or Google Gemini for faster inference on modest hardware, with your explicit opt-in.

The LLM_PROVIDER variable in your .env controls which mode is active.

Key capabilities

Contextual RAG & Tech Packs

A modular “Technical Encyclopedia” (packages/knowledge_base/02-TECH-PACKS) lets the assistant interview you to configure specific stacks — Flutter, Python, Firebase — with precise architecture rules baked in.

Context Factory

Automatically generates technical documentation (AGENTS.md, RULES.md) so your AI coding copilot works with full project context from day one.

Streaming chat

Real-time token streaming from any LLM provider with WebSocket delivery, heartbeat management, and backpressure handling for a fluid, non-blocking experience.

Hardware-agnostic tuning

Two .env variables — LLM_MAX_PROMPT_CHARS and RAG_MAX_CHUNKS — let you match the RAG pipeline to your model’s context window, from an 8K local Ollama model to a 200K-token cloud API.

Technology stack

Layer	Technology
Frontend	Flutter Desktop (Linux, Windows, macOS)
Backend	Python 3.12 · FastAPI · LangChain
AI engine	Ollama (local) · Groq Cloud · Google Gemini
Vector store	ChromaDB
Infrastructure	Docker Compose

Overview

Core Features

Installation & Setup

Guides

Development

The guided architectural workflow

Why local-first matters

Key capabilities

Contextual RAG & Tech Packs

Context Factory

Streaming chat

Hardware-agnostic tuning

Technology stack

Next steps

Quickstart

Architecture

Build docs developers (and LLMs) love

Overview

Core Features

Installation & Setup

Guides

Development

​The guided architectural workflow

​Why local-first matters

​Key capabilities

Contextual RAG & Tech Packs

Context Factory

Streaming chat

Hardware-agnostic tuning

​Technology stack

​Next steps

Quickstart

Architecture

Build docs developers (and LLMs) love

The guided architectural workflow

Why local-first matters

Key capabilities

Technology stack

Next steps