Quickstart

Prerequisites

Docker 20.10 or later
Docker Compose 2.0 or later
Git

Choosing your LLM provider: Set LLM_PROVIDER=ollama in your .env for fully offline, private inference. Set LLM_PROVIDER=gemini or LLM_PROVIDER=groq to use a cloud API for faster responses on modest hardware. You can switch at any time by editing .env and restarting the stack.

Installation

Clone the repository

git clone https://github.com/YOUR_USER/soft-architect-ai.git
cd soft-architect-ai

Configure your environment

Copy the provided template and open it in your editor:

cp .env.example .env

The key variables to set before starting:

# Choose your LLM provider: gemini | groq | ollama
LLM_PROVIDER=gemini

# For Gemini Cloud (default)
GEMINI_API_KEY=your_gemini_api_key_here

# For Groq Cloud
# GROQ_API_KEY=your_groq_api_key_here

# For local Ollama — no API key required
# OLLAMA_MODEL=llama3.2

Never commit your .env file. It is listed in .gitignore by default. Only .env.example is tracked in version control.

Start all services

The start_stack.sh script runs pre-flight checks, pulls images, and starts all containers:

./scripts/devops/start_stack.sh

Alternatively, start services directly with Docker Compose:

docker compose --env-file .env -f infrastructure/docker-compose.yml up -d --build

When the stack is ready, the script prints the following URLs:

- API:      http://localhost:8000
- API docs: http://localhost:8000/docs
- ChromaDB: http://localhost:8001
- Ollama:   http://localhost:11434

Verify the running containers:

docker ps

You should see three containers: sa_api, sa_chromadb, and sa_ollama.

Ingest the knowledge base

Before the first conversation, load the curated technical knowledge base into ChromaDB. The start_stack.sh script handles this automatically on first run. You can verify ChromaDB is populated by checking its HTTP API directly:

curl http://localhost:8001/api/v1/collections

You should see the softarchitect collection listed. If the collection is empty, restart the stack — the API container seeds the vector store on startup.

The dedicated /api/v1/knowledge/ingest and /api/v1/knowledge/status endpoints are planned for Phase 2. Knowledge base ingestion currently happens automatically during container initialization.

Open the Flutter desktop app

Launch the Flutter frontend in development mode:

./scripts/devops/LAUNCH_FLUTTER_APP_DEV.sh

The app connects to the API at http://localhost:8000 automatically. You are now ready to start your first guided architectural session.

Tuning for your hardware

Two variables control how much context the RAG pipeline sends to the LLM. Adjust them in .env to match your model’s context window and prevent out-of-memory errors:

Variable	Default	Ollama (8K context)	Gemini / Groq
`LLM_MAX_PROMPT_CHARS`	`200000`	`30000`	`200000`
`RAG_MAX_CHUNKS`	`3`	`2`	`5`

# .env — Local Ollama model with 8K context window
LLM_MAX_PROMPT_CHARS=30000
RAG_MAX_CHUNKS=2

# .env — Cloud API (Gemini or Groq) — maximum precision
LLM_MAX_PROMPT_CHARS=200000
RAG_MAX_CHUNKS=5

The pipeline truncates the prompt rather than silently dropping RAG context, so architectural recommendations always remain grounded in the knowledge base.

Stopping the stack

./scripts/devops/stop_stack.sh

Next steps

Read the Architecture page to understand how the system is structured.
Browse the interactive API reference at http://localhost:8000/docs once the stack is running.

Overview

Core Features

Installation & Setup

Guides

Development

Prerequisites

Installation

Tuning for your hardware

Stopping the stack

Next steps

Build docs developers (and LLMs) love

Overview

Core Features

Installation & Setup

Guides

Development

​Prerequisites

​Installation

​Tuning for your hardware

​Stopping the stack

​Next steps

Build docs developers (and LLMs) love

Prerequisites

Installation

Tuning for your hardware

Stopping the stack

Next steps