Skip to main content

Prerequisites

Choosing your LLM provider: Set LLM_PROVIDER=ollama in your .env for fully offline, private inference. Set LLM_PROVIDER=gemini or LLM_PROVIDER=groq to use a cloud API for faster responses on modest hardware. You can switch at any time by editing .env and restarting the stack.

Installation

1

Clone the repository

git clone https://github.com/YOUR_USER/soft-architect-ai.git
cd soft-architect-ai
2

Configure your environment

Copy the provided template and open it in your editor:
cp .env.example .env
The key variables to set before starting:
# Choose your LLM provider: gemini | groq | ollama
LLM_PROVIDER=gemini

# For Gemini Cloud (default)
GEMINI_API_KEY=your_gemini_api_key_here

# For Groq Cloud
# GROQ_API_KEY=your_groq_api_key_here

# For local Ollama — no API key required
# OLLAMA_MODEL=llama3.2
Never commit your .env file. It is listed in .gitignore by default. Only .env.example is tracked in version control.
3

Start all services

The start_stack.sh script runs pre-flight checks, pulls images, and starts all containers:
./scripts/devops/start_stack.sh
Alternatively, start services directly with Docker Compose:
docker compose --env-file .env -f infrastructure/docker-compose.yml up -d --build
When the stack is ready, the script prints the following URLs:
- API:      http://localhost:8000
- API docs: http://localhost:8000/docs
- ChromaDB: http://localhost:8001
- Ollama:   http://localhost:11434
Verify the running containers:
docker ps
You should see three containers: sa_api, sa_chromadb, and sa_ollama.
4

Ingest the knowledge base

Before the first conversation, load the curated technical knowledge base into ChromaDB. The start_stack.sh script handles this automatically on first run. You can verify ChromaDB is populated by checking its HTTP API directly:
curl http://localhost:8001/api/v1/collections
You should see the softarchitect collection listed. If the collection is empty, restart the stack — the API container seeds the vector store on startup.
The dedicated /api/v1/knowledge/ingest and /api/v1/knowledge/status endpoints are planned for Phase 2. Knowledge base ingestion currently happens automatically during container initialization.
5

Open the Flutter desktop app

Launch the Flutter frontend in development mode:
./scripts/devops/LAUNCH_FLUTTER_APP_DEV.sh
The app connects to the API at http://localhost:8000 automatically. You are now ready to start your first guided architectural session.

Tuning for your hardware

Two variables control how much context the RAG pipeline sends to the LLM. Adjust them in .env to match your model’s context window and prevent out-of-memory errors:
VariableDefaultOllama (8K context)Gemini / Groq
LLM_MAX_PROMPT_CHARS20000030000200000
RAG_MAX_CHUNKS325
# .env — Local Ollama model with 8K context window
LLM_MAX_PROMPT_CHARS=30000
RAG_MAX_CHUNKS=2

# .env — Cloud API (Gemini or Groq) — maximum precision
LLM_MAX_PROMPT_CHARS=200000
RAG_MAX_CHUNKS=5
The pipeline truncates the prompt rather than silently dropping RAG context, so architectural recommendations always remain grounded in the knowledge base.

Stopping the stack

./scripts/devops/stop_stack.sh

Next steps

  • Read the Architecture page to understand how the system is structured.
  • Browse the interactive API reference at http://localhost:8000/docs once the stack is running.

Build docs developers (and LLMs) love