AgroIA uses two Ollama models running locally: one to convert agronomic report text into dense vector embeddings for semantic search, and one to generate natural-language answers grounded in those retrieved reports. Both models are served by Ollama and communicate with the application over HTTP. No external API calls or cloud GPU are required.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sdarionicolas-boop/AgroIA-RAG/llms.txt
Use this file to discover all available pages before exploring further.
Models at a glance
| Role | Model | Output | Used by |
|---|---|---|---|
| Embeddings | nomic-embed-text | 768-dimensional float32 vector | src/utils/loader.py, src/rag/core.py |
| Generation | gemma3:4b | Agronomic expert text response | src/rag/core.py |
Pulling the models
Run bothollama pull commands before starting AgroIA. They need to be available locally before the application attempts to use them.
Embedding model: nomic-embed-text
nomic-embed-text converts the contenido_tecnico text field of each lot report into a 768-dimensional vector. This vector is stored in the embedding column of informes_lotes as a vector(768) type (pgvector). At query time, the RAG engine embeds the user’s question using the same model and retrieves the most semantically similar lot reports via cosine distance.
The embedding call is made through the generate_embedding function in src/utils/loader.py:
Generation model: gemma3:4b
gemma3:4b is the agronomic expert LLM. The RAG engine in src/rag/core.py builds a prompt from the BASE_PROMPT system message, injects the retrieved lot context, and calls Ollama with the following inference options:
| Option | Value | Effect |
|---|---|---|
temperature | 0.2 | Low randomness — produces consistent, factual agronomic answers |
num_predict | 1024 | Maximum tokens in the generated response |
Local LLM inference latency on CPU typically ranges from 14 to 71 seconds per query depending on hardware. On machines with a supported GPU, Ollama will use it automatically, significantly reducing latency.
Configuring Ollama
OLLAMA_URL
AgroIA connects to Ollama at the URL defined by theOLLAMA_URL environment variable (default: http://localhost:11434). When running inside Docker Compose, this is automatically overridden to http://host.docker.internal:11434. See Environment variables for details.
Running Ollama as a service
Start Ollama in the background before launching AgroIA:OLLAMA_URL when the API or pipeline starts.
Changing models
To use different models, update the corresponding variables inconfig/.env:
Re-ingest all data (embedding model only)
If you changed
EMBEDDING_MODEL, you must re-ingest every lot to regenerate embeddings in the new model’s vector space. Re-running the pipeline or posting existing payloads to /ingesta will overwrite the stored embeddings via the upsert logic.Advanced configuration
Running Ollama on a remote machine
Running Ollama on a remote machine
Set Ensure the remote machine has both models pulled and that port
OLLAMA_URL to the remote server’s address:11434 is reachable from your AgroIA host.GPU acceleration
GPU acceleration
Ollama detects compatible NVIDIA and Apple Silicon GPUs automatically. Install the appropriate CUDA drivers (NVIDIA) or ensure you are running a native macOS Ollama build (Apple Silicon). No configuration change is needed in AgroIA — the speed improvement is transparent.