POST /api/chat/with-context
Streams responses from the AI assistant with the complete medical knowledge base as context, including all scraped medical data.Request Body
The Ollama model to use (e.g., “llama2”, “mistral”, “medllama2”)
The user’s question to the AI
Custom system instruction. Defaults to: “Responda com base EXCLUSIVA no contexto. Se algo não estiver no contexto, diga que não há informação suficiente.”
Maximum number of scraped items to include in context
Maximum total characters of context (prevents token overflow)
Response
The response is streamed using Server-Sent Events (SSE) format withContent-Type: application/octet-stream.
Each line contains a JSON object with the AI’s response chunk.
Example Request
cURL
JavaScript
Example Response Stream
How It Works
FromOllamaChatController.java:108-176:
-
Context Building: Retrieves ALL scraped medical data
-
Prompt Construction: Combines instruction + context + question
- Ollama Streaming: Sends to Ollama and streams back the response
Context Sources
The context includes data from all scraped sources:- CID Codes: Disease classifications and descriptions
- Health News: Latest medical news from government sources
- Medical Skills: CFM-defined medical specialties and abilities
- Vigilance Guides: Public health surveillance definitions
Differences from /api/chat
| Feature | /api/chat | /api/chat/with-context |
|---|---|---|
| Context | Automatic (keyword-based) or optional | Always includes ALL context |
| Context Size | Targeted/filtered | Complete knowledge base |
| Use Case | General questions | Questions needing comprehensive medical data |
| Performance | Faster (less context) | Slower (more context) |
Use Cases
- Complex medical queries requiring multiple data sources
- Questions about current health news and guidelines
- Comprehensive disease information lookup
- Correlating symptoms with recent health alerts
- Medical research and information gathering
For simpler questions or when you want targeted context, use
POST /api/chat which automatically detects relevant context.Performance Considerations
- Context Size: The
maxCharsparameter prevents token overflow. Adjust based on your Ollama model’s context window - Response Time: Larger context = longer initial processing time
- Memory: Streaming response uses minimal memory on the server
- Ollama Model: Choose a model with appropriate context window size (e.g., 8K, 32K, 128K tokens)
Error Responses
| Status Code | Description |
|---|---|
| 200 | Streaming response in progress |
| 400 | Invalid request (missing model or question) |
| 500 | Error connecting to Ollama, context building failed, or streaming error |
This endpoint is ideal for the medical chatbot feature where comprehensive, accurate medical information is critical.