Skip to main content
The chat endpoints drive the core SoftArchitect AI workflow: a user message is enriched with relevant context retrieved from the knowledge base (RAG) and streamed token-by-token back to the client as Server-Sent Events (SSE). All responses use text/event-stream — there is no synchronous JSON response endpoint for chat (the /chat/message endpoint is deprecated and returns 400).

POST /api/v1/chat/stream

The primary chat endpoint. Accepts a ChatRequest body, retrieves context from ChromaDB, and streams the LLM response as SSE events.

Request

conversation_id
string (UUID)
required
Unique conversation identifier. Associates this message with an existing conversation session.
message
string
required
The user’s message or architectural requirements. Maximum 20,000 characters (configurable via CHAT_MAX_MESSAGE_LENGTH). Input is sanitized server-side to prevent XSS and prompt injection.
project_id
string (UUID)
required
The project this message belongs to. Used to scope RAG retrieval to the project’s vector store.
user_name
string
default:"Developer"
User’s display name, injected into the LLM system prompt for personalization. Maximum 100 characters.
doc_type
string
The document type to generate. Instructs the Sequential Orchestrator to load the matching template and example. When omitted defaults to "UNSORTED".Examples: PROJECT_MANIFESTO, ARCHITECTURE_SPEC, DOMAIN_LANGUAGE, TECH_STACK.
history
array
Previous conversation turns for multi-turn continuity. Maximum messages is configurable via CHAT_MAX_HISTORY_MESSAGES (default 100 in code, 50 in .env.example template). Each entry must have role and content fields.
project_context
object
All documents already generated for this project. Keys are relative file paths (e.g., "PROJECT_MANIFESTO.md"), values are the full file contents. Injected into the LLM prompt to maintain consistency across documents.
metadata
object
Optional key-value metadata from the client. Ignored by the server but stored for traceability.

Response

Returns a text/event-stream response with the following headers:
Cache-Control: no-cache
Connection: keep-alive
X-Accel-Buffering: no
Three SSE event types are emitted:

event: message

Emitted for each streamed token:
event: message
data: {"token": "Here", "is_final": false}
token
string
A single token or word fragment from the LLM output stream.
is_final
boolean
Always false for message events. Set to true only in the final done event.

event: done

Emitted once when the stream completes:
event: done
data: {"full_response": "", "sources": ["PROJECT_MANIFESTO"], "metadata": {"template_used": "PROJECT_MANIFESTO"}}
full_response
string
Full concatenated response. Currently empty — clients should assemble tokens from message events.
sources
array
Document types used as context for this generation.
metadata
object
Generation metadata.

event: error

Emitted when a recoverable or unrecoverable error occurs during streaming:
event: error
data: {"error": "AI Engine is unreachable", "code": "LLM_CONNECTION_ERROR", "retry": true}
error
string
Human-readable error description.
code
string
Machine-readable error code. Possible values:
  • LLM_CONNECTION_ERROR — AI engine unreachable
  • RAG_RETRIEVAL_ERROR — knowledge base search failed
  • STREAM_ERROR — unexpected error during streaming
retry
boolean
Whether the client should automatically retry. true for transient errors (connection), false for persistent failures.

Errors

StatusCondition
422Pydantic validation failed (e.g., message too long, invalid history role)
403Invalid or missing X-API-Key header (when API key is configured)

Example

curl -X POST http://localhost:8000/api/v1/chat/stream \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key-here" \
  --no-buffer \
  -d '{
    "conversation_id": "550e8400-e29b-41d4-a716-446655440000",
    "project_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "message": "Generate a project manifesto for a Flutter task manager app",
    "doc_type": "PROJECT_MANIFESTO",
    "user_name": "Developer"
  }'
Example SSE output:
event: message
data: {"token": "#", "is_final": false}

event: message
data: {"token": " Project", "is_final": false}

event: message
data: {"token": " Manifesto", "is_final": false}

event: done
data: {"full_response": "", "sources": ["PROJECT_MANIFESTO"], "metadata": {"template_used": "PROJECT_MANIFESTO"}}

POST /api/v1/chat/generate

Legacy streaming document generation endpoint. Kept for backward compatibility with older client versions. New clients should prefer /api/v1/chat/stream. Uses a different SSE event format (event: token instead of event: message) and does not go through FastAPI dependency injection.

Request

message
string
required
User message or requirements text.
doc_type
string
required
Document type identifier (e.g., PROJECT_MANIFESTO).
project_context
object
default:"{}"
Arbitrary key-value project metadata injected into the orchestrator context.
chat_history
array
default:"[]"
Previous conversation turns.

Response

Returns a text/event-stream response. Three SSE event types:

event: token

event: token
data: {"token": "Architecture", "index": 0}
token
string
A single token from the LLM output.
index
integer
Token sequence index. Currently always 0.

event: done

event: done
data: {"total_tokens": 0, "duration_ms": 0}
total_tokens
integer
Total tokens generated. Currently always 0.
duration_ms
integer
Generation duration in milliseconds. Currently always 0.

event: error

event: error
data: {"code": "RAG_001", "message": "RAG operation failed"}
code
string
Application error code (e.g., RAG_001, LLM_001, ERR).
message
string
Human-readable error description.

Errors

StatusCondition
500Failed to initialize the streaming generator

Example

curl -X POST http://localhost:8000/api/v1/chat/generate \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "message": "Design a microservices architecture for an e-commerce platform",
    "doc_type": "ARCHITECTURE_SPEC",
    "project_context": {},
    "chat_history": []
  }'

Build docs developers (and LLMs) love