Pre-production steps
Configure .env with production settings
Copy With
.env.example to .env and set the following values:DEBUG=False, FastAPI disables automatic interactive API docs (/docs, /redoc) and suppresses internal stack traces from API error responses.The full set of configurable variables:| Variable | Default | Production recommendation |
|---|---|---|
ENVIRONMENT | development | production |
DEBUG | False | False |
LOG_LEVEL | INFO | WARNING |
CHAT_MAX_HISTORY_MESSAGES | 50 | 20–50 depending on RAM |
CHAT_MAX_MESSAGE_LENGTH | 20000 | Keep at 20000 or reduce for tighter input control |
LLM_MAX_PROMPT_CHARS | 200000 | 30000 for local Ollama; 200000 for cloud APIs |
RAG_MAX_CHUNKS | 3 | 2 for local Ollama; 3–5 for cloud APIs |
Enable HTTPS with a reverse proxy
The API server binds to HTTP on port 8000. In production, place an Nginx or Traefik reverse proxy in front to terminate TLS.Nginx example (
/etc/nginx/sites-available/softarchitect):The
proxy_buffering off directive is required for the streaming chat endpoint (POST /api/v1/chat/stream). Without it, Nginx will buffer the entire Server-Sent Events stream before forwarding, breaking the real-time output.Choose your LLM provider
Set
LLM_PROVIDER in .env to one of three options:| Provider | Privacy | Speed | Hardware requirement |
|---|---|---|---|
gemini | Data sent to Google | Fast | None (cloud) |
groq | Data sent to Groq | Very fast | None (cloud) |
ollama | Fully local | Depends on GPU | 8–16 GB RAM |
Configure a backup strategy
Persistent data lives in two directories:Back up Or add a volume backup step to your CI/CD pipeline before any destructive operations.
infrastructure/chroma_data/ regularly. If this directory is lost, you will need to re-ingest the knowledge base and will lose any project-specific embeddings.A simple daily backup with rsync:Set up log monitoring
Application logs are written to You can tail logs in real time:Or stream from the mounted log directory:For production, consider forwarding logs to a centralised service (Loki, Datadog, CloudWatch) by mounting a logging driver in
infrastructure/data/logs/ via the Docker volume mount defined in docker-compose.yml:docker-compose.yml.Set LOG_LEVEL=WARNING in .env to reduce noise. The API server uses Python’s standard logging library and respects this setting across all modules.Review security settings
Two environment variables control the security posture of the AI pipeline:
IRON_MODE=True enables the full sanitisation pipeline: every user message is validated and stripped of prompt injection attempts before reaching the LLM. Disable only in isolated development environments.PII_DETECTION_ENABLED=True scans messages for personally identifiable information (names, emails, phone numbers, etc.) and redacts them before the prompt is constructed. This is particularly important if users are describing real customer data in their architecture interviews.Additional security hardening steps from the Security Hardening Policy:- Confirm
.envis not committed to version control (git statusshould not list it) - Confirm the Dockerfile runs as a non-root user (
USER appuser) - Confirm
docker-compose.ymluses${VAR}references, not hardcoded values - Apply restrictive permissions to data directories:
- Run the security audit script:
Set chat limits
Two variables prevent individual sessions from consuming excessive memory or generating oversized prompts:For a shared team deployment where multiple users run simultaneous sessions, consider reducing
CHAT_MAX_HISTORY_MESSAGES to 20–30 to limit per-session memory pressure on the Ollama container.The API enforces these limits at the request boundary — messages that exceed CHAT_MAX_MESSAGE_LENGTH are rejected with a 422 Unprocessable Entity response before reaching the LLM.Final checklist
Before going live, confirm each item below:-
DEBUG=FalseandLOG_LEVEL=WARNINGset in.env - HTTPS termination configured (Nginx/Traefik) with
proxy_buffering offfor SSE -
LLM_PROVIDERset to your chosen provider with valid API key (or Ollama running) -
chroma_data/backup strategy in place and tested - Log forwarding or monitoring configured
-
IRON_MODE=TrueandPII_DETECTION_ENABLED=True -
.envnot committed;chmod 600 .envapplied - Security validation script passes:
bash infrastructure/security-validation.sh - Docker containers run as non-root (
USER appuserin Dockerfile) -
CHAT_MAX_HISTORY_MESSAGESandCHAT_MAX_MESSAGE_LENGTHtuned for your expected load - Knowledge base vectors present in ChromaDB (
curl http://localhost:8001/api/v1/collectionsshows thesoftarchitectcollection)