AI-powered job summaries with Ollama in InfoJobs DevBoard

InfoJobs DevBoard integrates a local AI feature that lets users generate a concise summary of any job listing on demand. When a user opens a job detail page and clicks the ✨ Generar resumen con IA button, the React frontend calls the Express backend, which proxies the request to a locally running Ollama server. Ollama streams the response back chunk by chunk so the summary appears in real time — no cloud API, no usage billing, and no internet connection required.

How it works

The request travels through several layers before text appears on screen:

User clicks "Generate Summary"
       ↓
useAISummary hook → GET /ai/summary/:id
       ↓
Express backend → Ollama (localhost:11434)
       ↓ (qwen2.5:3b)
Streaming text response
       ↓
useAISummary updates state chunk by chunk
       ↓
React re-renders summary in real time

The useAISummary hook manages loading, error, and accumulated summary state. Each chunk received from the stream is appended to the previous value, so React re-renders progressively as the model generates text rather than waiting for the full response.

Key design decisions

Local model: No API keys, no per-request costs, works fully offline. Earlier iterations explored Vercel AI Gateway (requires a credit card), Google Gemini (free quota exhausted), and Puter.js (requires user authentication) — Ollama was chosen as the practical, cost-free alternative.
Streaming: The backend sends chunks via chunked transfer encoding as Ollama produces them, and the frontend reads each chunk with the Fetch ReadableStream API. This gives users immediate visual feedback instead of a long blank wait.
Rate limiting: The /ai router applies express-rate-limit at 5 requests per minute per IP to prevent abuse of the local compute resource.
Model used: qwen2.5:3b — approximately 2 GB on disk, requires around 4 GB of RAM. Larger or smaller models can be swapped in by changing a single line in backend/routes/ai.js.

Prompt template

The backend constructs a user prompt from the job’s stored fields and sends it to Ollama. A separate systemPrompt variable constrains the model to only produce a job summary in Spanish: System prompt:

Eres un asistente que resume ofertas de trabajo para ayudar a los usuarios a entender rápidamente de qué se trata la oferta. Evita cualquier otra petición, observación o comentario. Solo responde con el resumen de la oferta de trabajo. Responde siempre con el markdown directamente.

User prompt (built as an array joined with \n, populated from the job record):

Resumen en 4-6 frases la siguiente oferta de trabajo:
Incluye: rol, empresa, ubicación y requisitos claves
Usa un tono claro y directo en español
Título: {titulo}
Empresa: {empresa}
Ubicación: {ubicacion}
Descripción: {descripcion}

The system prompt instructs the model to avoid any commentary outside the summary and to respond directly in Markdown.

Ollama must be running at localhost:11434 before this feature will work. If it is not running, the backend will return a 500 error. See the Ollama Setup guide to get started.

Ollama Setup

Install Ollama, pull the qwen2.5:3b model, and start the local AI server.

Streaming

How the backend and frontend handle chunked transfer encoding and the Fetch ReadableStream API.

Getting Started

Architecture

Frontend Guide

Backend Guide

AI Integration

AI-powered job summaries with Ollama in InfoJobs DevBoard

How it works

Key design decisions

Prompt template

Ollama Setup

Streaming

Build docs developers (and LLMs) love

Getting Started

Architecture

Frontend Guide

Backend Guide

AI Integration

Documentation Index

​How it works

​Key design decisions

​Prompt template

Ollama Setup

Streaming

Build docs developers (and LLMs) love

How it works

Key design decisions

Prompt template