Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt

Use this file to discover all available pages before exploring further.

AgroIA uses two Ollama models running locally: one to convert agronomic report text into dense vector embeddings for semantic search, and one to generate natural-language answers grounded in those retrieved reports. Both models are served by Ollama and communicate with the application over HTTP. No external API calls or cloud GPU are required.

Models at a glance

RoleModelOutputUsed by
Embeddingsnomic-embed-text768-dimensional float32 vectorsrc/utils/loader.py, src/rag/core.py
Generationgemma3:4bAgronomic expert text responsesrc/rag/core.py

Pulling the models

Run both ollama pull commands before starting AgroIA. They need to be available locally before the application attempts to use them.
ollama pull nomic-embed-text
ollama pull gemma3:4b
Verify that both models are loaded and ready with ollama list. The output should show both model names with their sizes before you start the API or pipeline.

Embedding model: nomic-embed-text

nomic-embed-text converts the contenido_tecnico text field of each lot report into a 768-dimensional vector. This vector is stored in the embedding column of informes_lotes as a vector(768) type (pgvector). At query time, the RAG engine embeds the user’s question using the same model and retrieves the most semantically similar lot reports via cosine distance. The embedding call is made through the generate_embedding function in src/utils/loader.py:
resp = ollama.embed(model=settings.embedding_model, input=text)
return resp["embeddings"][0]  # list of 768 floats
Changing EMBEDDING_MODEL to a model that produces a different vector dimension requires you to drop and recreate the embedding column with the new size, then re-ingest every lot to regenerate its embedding. Mixing embeddings from different models in the same column produces meaningless similarity scores.

Generation model: gemma3:4b

gemma3:4b is the agronomic expert LLM. The RAG engine in src/rag/core.py builds a prompt from the BASE_PROMPT system message, injects the retrieved lot context, and calls Ollama with the following inference options:
OptionValueEffect
temperature0.2Low randomness — produces consistent, factual agronomic answers
num_predict1024Maximum tokens in the generated response
Local LLM inference latency on CPU typically ranges from 14 to 71 seconds per query depending on hardware. On machines with a supported GPU, Ollama will use it automatically, significantly reducing latency.

Configuring Ollama

OLLAMA_URL

AgroIA connects to Ollama at the URL defined by the OLLAMA_URL environment variable (default: http://localhost:11434). When running inside Docker Compose, this is automatically overridden to http://host.docker.internal:11434. See Environment variables for details.

Running Ollama as a service

Start Ollama in the background before launching AgroIA:
ollama serve &
Or install Ollama as a system service so it starts automatically on boot. The Ollama server must be reachable at OLLAMA_URL when the API or pipeline starts.

Changing models

To use different models, update the corresponding variables in config/.env:
EMBEDDING_MODEL=nomic-embed-text
GENERATION_MODEL=gemma3:4b
1

Pull the new model

ollama pull <new-model-name>
2

Update config/.env

Change EMBEDDING_MODEL or GENERATION_MODEL (or both) in config/.env.
3

Re-ingest all data (embedding model only)

If you changed EMBEDDING_MODEL, you must re-ingest every lot to regenerate embeddings in the new model’s vector space. Re-running the pipeline or posting existing payloads to /ingesta will overwrite the stored embeddings via the upsert logic.
4

Restart the application

python start.py
The new model names are read from settings at startup.

Advanced configuration

Set OLLAMA_URL to the remote server’s address:
OLLAMA_URL=http://192.168.1.100:11434
Ensure the remote machine has both models pulled and that port 11434 is reachable from your AgroIA host.
Ollama detects compatible NVIDIA and Apple Silicon GPUs automatically. Install the appropriate CUDA drivers (NVIDIA) or ensure you are running a native macOS Ollama build (Apple Silicon). No configuration change is needed in AgroIA — the speed improvement is transparent.

Build docs developers (and LLMs) love