packages/knowledge_base/, splits the content into chunks, generates embeddings, and stores them in ChromaDB — your local vector database.
Ingestion happens automatically when the API container starts. You only need to trigger it manually when you add or update Tech Packs or templates.
The
/api/v1/knowledge/ingest and /api/v1/knowledge/status HTTP endpoints are planned for Phase 2 of the roadmap. Currently, knowledge base ingestion is handled on container startup and through the per-project document ingest endpoint described below.What gets ingested
The ingestion pipeline processes three document collections:tech-packs
Stack-specific profiles and governance rules from
02-TECH-PACKS/. This is what makes the AI stack-aware.templates
Document generation templates from
01-TEMPLATES/. These define the structure of every output document (manifesto, data model, ADRs, etc.).examples
Worked examples from
MASTER_WORKFLOW_EXAMPLES/. The AI uses these as few-shot references during generation.Verifying the knowledge base
After starting the stack, confirm ChromaDB is populated by querying its HTTP API directly:softarchitect collection. To count the embeddings in it:
Application startup complete and the ChromaDB initialization messages.
Ingesting per-project documents
When the workflow generates a new architectural document (e.g.PROJECT_MANIFESTO.md), the Flutter client automatically sends it to the project’s vector store via the ingest endpoint:
Collection breakdown
| Collection | Source path | Contents |
|---|---|---|
softarchitect (global KB) | packages/knowledge_base/ | Architecture patterns, Tech Packs, templates, examples |
project_{id} (per-project) | Generated via API | Documents produced during the guided workflow for a specific project |
When to re-ingest the global knowledge base
Re-ingestion of the global knowledge base is needed whenever the source files change:- After adding a new custom Tech Pack to
02-TECH-PACKS/ - After updating an existing Tech Pack profile or rules file
- After adding or modifying templates in
01-TEMPLATES/ - After pulling upstream changes that include knowledge base updates
Troubleshooting
ChromaDB collection is empty after startup
ChromaDB collection is empty after startup
Common causes:
- ChromaDB not ready — run
docker psand confirmsa_chromadbshows(healthy). If it shows(starting), wait 15–30 seconds and check again. - Empty knowledge base directory — confirm the files exist:
ls packages/knowledge_base/02-TECH-PACKS/ - API startup failure — run
docker logs sa_api --tail 50and look for errors during initialization.
curl to ChromaDB returns connection refused
curl to ChromaDB returns connection refused
ChromaDB is exposed on port If this fails, confirm
8001 on the host (not 8000 — that is the API server). Check:sa_chromadb is running: docker ps | grep chromadb.Per-project ingest returns 400 or 500
Per-project ingest returns 400 or 500
A Look for
400 means the request payload is invalid — check that doc_name is 1–255 characters and markdown_content is not empty.A 500 means the backend failed to write to ChromaDB. Check server logs:Ingestion failed for project= log lines.Re-ingestion does not reflect updated files
Re-ingestion does not reflect updated files
ChromaDB uses content-based IDs for upsert. If you renamed a file without changing its content, the old vectors remain under the old ID. Perform a clean re-ingestion by deleting
infrastructure/chroma_data/ before restarting.