Chat

The chat endpoints drive the core SoftArchitect AI workflow: a user message is enriched with relevant context retrieved from the knowledge base (RAG) and streamed token-by-token back to the client as Server-Sent Events (SSE). All responses use text/event-stream — there is no synchronous JSON response endpoint for chat (the /chat/message endpoint is deprecated and returns 400).

POST /api/v1/chat/stream

The primary chat endpoint. Accepts a ChatRequest body, retrieves context from ChromaDB, and streams the LLM response as SSE events.

Request

conversation_id

string (UUID)

required

Unique conversation identifier. Associates this message with an existing conversation session.

message

string

required

The user’s message or architectural requirements. Maximum 20,000 characters (configurable via CHAT_MAX_MESSAGE_LENGTH). Input is sanitized server-side to prevent XSS and prompt injection.

project_id

string (UUID)

required

The project this message belongs to. Used to scope RAG retrieval to the project’s vector store.

user_name

string

default:"Developer"

User’s display name, injected into the LLM system prompt for personalization. Maximum 100 characters.

doc_type

string

The document type to generate. Instructs the Sequential Orchestrator to load the matching template and example. When omitted defaults to "UNSORTED".Examples: PROJECT_MANIFESTO, ARCHITECTURE_SPEC, DOMAIN_LANGUAGE, TECH_STACK.

history

array

Previous conversation turns for multi-turn continuity. Maximum messages is configurable via CHAT_MAX_HISTORY_MESSAGES (default 100 in code, 50 in .env.example template). Each entry must have role and content fields.

Show history item

role

string

required

Message role. One of: user, assistant, system.

content

string

required

Message text. Maximum 20,000 characters.

project_context

object

All documents already generated for this project. Keys are relative file paths (e.g., "PROJECT_MANIFESTO.md"), values are the full file contents. Injected into the LLM prompt to maintain consistency across documents.

metadata

object

Optional key-value metadata from the client. Ignored by the server but stored for traceability.

Response

Returns a text/event-stream response with the following headers:

Cache-Control: no-cache
Connection: keep-alive
X-Accel-Buffering: no

Three SSE event types are emitted:

`event: message`

Emitted for each streamed token:

event: message
data: {"token": "Here", "is_final": false}

token

string

A single token or word fragment from the LLM output stream.

is_final

boolean

Always false for message events. Set to true only in the final done event.

`event: done`

Emitted once when the stream completes:

event: done
data: {"full_response": "", "sources": ["PROJECT_MANIFESTO"], "metadata": {"template_used": "PROJECT_MANIFESTO"}}

full_response

string

Full concatenated response. Currently empty — clients should assemble tokens from message events.

sources

array

Document types used as context for this generation.

metadata

object

Generation metadata.

Show metadata fields

template_used

string

The template identifier that was applied.

`event: error`

Emitted when a recoverable or unrecoverable error occurs during streaming:

event: error
data: {"error": "AI Engine is unreachable", "code": "LLM_CONNECTION_ERROR", "retry": true}

error

string

Human-readable error description.

code

string

Machine-readable error code. Possible values:

LLM_CONNECTION_ERROR — AI engine unreachable
RAG_RETRIEVAL_ERROR — knowledge base search failed
STREAM_ERROR — unexpected error during streaming

retry

boolean

Whether the client should automatically retry. true for transient errors (connection), false for persistent failures.

Errors

Status	Condition
`422`	Pydantic validation failed (e.g., message too long, invalid history role)
`403`	Invalid or missing `X-API-Key` header (when API key is configured)

Example

curl -X POST http://localhost:8000/api/v1/chat/stream \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key-here" \
  --no-buffer \
  -d '{
    "conversation_id": "550e8400-e29b-41d4-a716-446655440000",
    "project_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "message": "Generate a project manifesto for a Flutter task manager app",
    "doc_type": "PROJECT_MANIFESTO",
    "user_name": "Developer"
  }'

Example SSE output:

event: message
data: {"token": "#", "is_final": false}

event: message
data: {"token": " Project", "is_final": false}

event: message
data: {"token": " Manifesto", "is_final": false}

event: done
data: {"full_response": "", "sources": ["PROJECT_MANIFESTO"], "metadata": {"template_used": "PROJECT_MANIFESTO"}}

POST /api/v1/chat/generate

Legacy streaming document generation endpoint. Kept for backward compatibility with older client versions. New clients should prefer /api/v1/chat/stream. Uses a different SSE event format (event: token instead of event: message) and does not go through FastAPI dependency injection.

Request

message

string

required

User message or requirements text.

doc_type

string

required

Document type identifier (e.g., PROJECT_MANIFESTO).

project_context

object

default:"{}"

Arbitrary key-value project metadata injected into the orchestrator context.

chat_history

array

default:"[]"

Previous conversation turns.

Show chat history item

role

string

required

Message role: user, assistant, or system. Extra fields (e.g., id, timestamp) are silently ignored.

content

string

required

Message text.

Response

Returns a text/event-stream response. Three SSE event types:

`event: token`

event: token
data: {"token": "Architecture", "index": 0}

token

string

A single token from the LLM output.

index

integer

Token sequence index. Currently always 0.

`event: done`

event: done
data: {"total_tokens": 0, "duration_ms": 0}

total_tokens

integer

Total tokens generated. Currently always 0.

duration_ms

integer

Generation duration in milliseconds. Currently always 0.

`event: error`

event: error
data: {"code": "RAG_001", "message": "RAG operation failed"}

code

string

Application error code (e.g., RAG_001, LLM_001, ERR).

message

string

Human-readable error description.

Errors

Status	Condition
`500`	Failed to initialize the streaming generator

Example

curl -X POST http://localhost:8000/api/v1/chat/generate \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "message": "Design a microservices architecture for an e-commerce platform",
    "doc_type": "ARCHITECTURE_SPEC",
    "project_context": {},
    "chat_history": []
  }'

API Overview

Endpoints

POST /api/v1/chat/stream

Request

Response

`event: message`

`event: done`

`event: error`

Errors

Example

POST /api/v1/chat/generate

Request

Response

`event: token`

`event: done`

`event: error`

Errors

Example

Build docs developers (and LLMs) love

API Overview

Endpoints

​POST /api/v1/chat/stream

​Request

​Response

​event: message

​event: done

​event: error

​Errors

​Example

​POST /api/v1/chat/generate

​Request

​Response

​event: token

​event: done

​event: error

​Errors

​Example

Build docs developers (and LLMs) love

POST /api/v1/chat/stream

Request

Response

`event: message`

`event: done`

`event: error`

Errors

Example

POST /api/v1/chat/generate

Request

Response

`event: token`

`event: done`

`event: error`

Errors

Example