text/event-stream — there is no synchronous JSON response endpoint for chat (the /chat/message endpoint is deprecated and returns 400).
POST /api/v1/chat/stream
The primary chat endpoint. Accepts aChatRequest body, retrieves context from ChromaDB, and streams the LLM response as SSE events.
Request
Unique conversation identifier. Associates this message with an existing conversation session.
The user’s message or architectural requirements. Maximum 20,000 characters (configurable via
CHAT_MAX_MESSAGE_LENGTH). Input is sanitized server-side to prevent XSS and prompt injection.The project this message belongs to. Used to scope RAG retrieval to the project’s vector store.
User’s display name, injected into the LLM system prompt for personalization. Maximum 100 characters.
The document type to generate. Instructs the Sequential Orchestrator to load the matching template and example. When omitted defaults to
"UNSORTED".Examples: PROJECT_MANIFESTO, ARCHITECTURE_SPEC, DOMAIN_LANGUAGE, TECH_STACK.Previous conversation turns for multi-turn continuity. Maximum messages is configurable via
CHAT_MAX_HISTORY_MESSAGES (default 100 in code, 50 in .env.example template). Each entry must have role and content fields.All documents already generated for this project. Keys are relative file paths (e.g.,
"PROJECT_MANIFESTO.md"), values are the full file contents. Injected into the LLM prompt to maintain consistency across documents.Optional key-value metadata from the client. Ignored by the server but stored for traceability.
Response
Returns atext/event-stream response with the following headers:
event: message
Emitted for each streamed token:
A single token or word fragment from the LLM output stream.
Always
false for message events. Set to true only in the final done event.event: done
Emitted once when the stream completes:
Full concatenated response. Currently empty — clients should assemble tokens from
message events.Document types used as context for this generation.
Generation metadata.
event: error
Emitted when a recoverable or unrecoverable error occurs during streaming:
Human-readable error description.
Machine-readable error code. Possible values:
LLM_CONNECTION_ERROR— AI engine unreachableRAG_RETRIEVAL_ERROR— knowledge base search failedSTREAM_ERROR— unexpected error during streaming
Whether the client should automatically retry.
true for transient errors (connection), false for persistent failures.Errors
| Status | Condition |
|---|---|
422 | Pydantic validation failed (e.g., message too long, invalid history role) |
403 | Invalid or missing X-API-Key header (when API key is configured) |
Example
POST /api/v1/chat/generate
Legacy streaming document generation endpoint. Kept for backward compatibility with older client versions. New clients should prefer/api/v1/chat/stream.
Uses a different SSE event format (event: token instead of event: message) and does not go through FastAPI dependency injection.
Request
User message or requirements text.
Document type identifier (e.g.,
PROJECT_MANIFESTO).Arbitrary key-value project metadata injected into the orchestrator context.
Previous conversation turns.
Response
Returns atext/event-stream response. Three SSE event types:
event: token
A single token from the LLM output.
Token sequence index. Currently always
0.event: done
Total tokens generated. Currently always
0.Generation duration in milliseconds. Currently always
0.event: error
Application error code (e.g.,
RAG_001, LLM_001, ERR).Human-readable error description.
Errors
| Status | Condition |
|---|---|
500 | Failed to initialize the streaming generator |