InfoJobs DevBoard does not wait for the entire AI-generated summary before displaying anything. Instead, the backend uses Express’sDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/mauroperez055/infoJobs/llms.txt
Use this file to discover all available pages before exploring further.
res.write() to forward each token to the client as soon as Ollama produces it, and the frontend uses response.body.getReader() to read those tokens one chunk at a time and append them to the displayed text. This pipeline means users see the first words of the summary within a second or two rather than staring at a spinner until the full response is ready.
Backend streaming
TheGET /ai/summary/:id route sets two response headers and then enters an async loop that iterates over Ollama’s streaming response:
Content-Type: text/plain; charset=utf-8— tells the browser to treat the body as plain text encoded in UTF-8. The frontend renders it as Markdown after the stream completes, but during streaming it arrives as raw text.Transfer-Encoding: chunked— instructs the HTTP layer to send the response body in a series of chunks rather than buffering it until the full content is available. Each call tores.write()flushes one chunk to the client immediately.
for await loop exhausts all parts from Ollama, res.end() signals the end of the response body.
Frontend streaming
TheuseAISummary hook in frontend/src/hooks/useAISummary.jsx opens the stream with the native Fetch API and reads it incrementally:
reader.read() returns a { done, value } pair on every iteration. When done is true the stream has ended and the loop exits. Otherwise, value is a Uint8Array of raw bytes that TextDecoder.decode() converts to a string.
The { stream: true } option passed to TextDecoder.decode() is important: it tells the decoder to hold any incomplete multi-byte character sequence at the end of the current chunk and prepend it to the next chunk. Without this flag, multi-byte Unicode characters (such as accented Spanish letters or emoji) that happen to be split across two chunks would be decoded incorrectly and appear as replacement characters (\uFFFD).
Each decoded string is appended to the accumulated summary state value with a functional update, triggering a React re-render that extends the visible text on screen.
The hook exposes three pieces of state and one action:
| Value | Type | Description |
|---|---|---|
summary | string | null | Accumulated summary text; grows as chunks arrive |
loading | boolean | true while the stream is open |
error | string | null | Set to 'Error al generar el resumen' on failure |
generateSummary | () => Promise<void> | Initiates the fetch and streaming loop |
Rate limiting
The AI router appliesexpress-rate-limit to every route it handles, including the summary endpoint:
standardHeaders: 'draft-8' option instructs express-rate-limit to attach standard RateLimit-* response headers (as defined by the IETF draft-8 specification) so clients can inspect their remaining quota.
Error handling
If Ollama throws an error during streaming, the catch block distinguishes between two situations:- Headers not yet sent — the error occurred before any chunk was written to the response. It is still possible to send a proper JSON error body with a 500 status code, which the frontend can catch and surface to the user.
- Headers already sent — at least one chunk reached the client, meaning the browser has already started rendering the partial summary. Changing the status code or content type is no longer possible. The route calls
res.end()to cleanly close the connection; the frontend’s stream loop will exit naturally when it readsdone: true.