Skip to main content

Overview

SIAA (Sistema Inteligente de Apoyo Administrativo) is an AI-powered judicial document management system that uses Ollama with the Qwen2.5:3b model for intelligent document search and question answering.

System Requirements

Python

Python 3.8 or higher

LibreOffice

LibreOffice headless for document conversion

Tesseract

Tesseract OCR for scanned PDF processing

Ollama

Ollama with Qwen2.5:3b model

Installation Steps

1

Install System Dependencies

Install LibreOffice headless and Tesseract OCR on Fedora/RHEL:
sudo dnf install libreoffice-headless tesseract tesseract-langpack-spa poppler-utils -y
For Debian/Ubuntu:
sudo apt install libreoffice-headless tesseract-ocr tesseract-ocr-spa poppler-utils -y
LibreOffice headless is required for converting .doc and .pdf files to .docx format.
2

Install Python Dependencies

Install required Python packages:
pip install Flask flask-cors pandas python-docx pymupdf4llm \
            pdf2image pytesseract openpyxl xlrd requests \
            --break-system-packages
The --break-system-packages flag is needed on Fedora 43 and similar systems with externally-managed Python environments.

Core Dependencies

  • Flask & flask-cors: Web server and API framework
  • pandas: Excel file processing
  • python-docx: Word document parsing
  • pymupdf4llm: PDF to Markdown conversion
  • pdf2image & pytesseract: OCR for scanned documents
  • openpyxl & xlrd: Excel format support
3

Install and Configure Ollama

Install Ollama from ollama.ai:
curl -fsSL https://ollama.ai/install.sh | sh
Pull the Qwen2.5:3b model:
ollama pull qwen2.5:3b
SIAA is configured to use the qwen2.5:3b model running on http://localhost:11434. This is defined in siaa_proxy.py:182.
Verify the installation:
ollama list
You should see qwen2.5:3b in the output.
4

Create Directory Structure

Create the required SIAA directories:
sudo mkdir -p /opt/siaa/fuentes
sudo mkdir -p /opt/siaa/fuentes/normativa
sudo mkdir -p /opt/siaa/logs
sudo mkdir -p /opt/siaa/instructivos
sudo mkdir -p /opt/siaa/pdfs_origen
Set appropriate permissions:
sudo chown -R $USER:$USER /opt/siaa

Directory Purpose

DirectoryPurpose
/opt/siaa/fuentesConverted Markdown documents for indexing
/opt/siaa/fuentes/normativaLegal documents and norms
/opt/siaa/logsSystem logs including quality monitoring
/opt/siaa/instructivosSource Word/Excel files
/opt/siaa/pdfs_origenPDF files for conversion
5

Configure Environment Variables

Set server IP and port (optional):
export SIAA_SERVER_IP="192.168.1.100"
export SIAA_SERVER_PORT="5000"
These environment variables are read in siaa_proxy.py:168-170. If not set, the system uses the IP from the HTTP Host header and defaults to port 5000.
To make these permanent, add them to ~/.bashrc or ~/.profile:
echo 'export SIAA_SERVER_IP="192.168.1.100"' >> ~/.bashrc
echo 'export SIAA_SERVER_PORT="5000"' >> ~/.bashrc
source ~/.bashrc
6

Deploy Source Files

Copy the SIAA Python files to your deployment location:
mkdir -p ~/siaa
cp siaa_proxy.py convertidor.py convertidor_pdf.py ~/siaa/
cp -r Web ~/siaa/
The Web directory contains:
  • index.html: Main web interface
  • index2121.html: Alternative interface version
  • default.conf: Nginx configuration template

Configuration Details

Proxy Server Settings

The Flask proxy server (siaa_proxy.py) has the following key configuration:
OLLAMA_URL             = "http://localhost:11434"
MODEL                  = "qwen2.5:3b"
CARPETA_FUENTES        = "/opt/siaa/fuentes"
LOG_ARCHIVO            = "/opt/siaa/logs/calidad.jsonl"
MAX_DOCS_CONTEXTO      = 2
CHUNK_SIZE             = 800
CHUNK_OVERLAP          = 300
MAX_CHUNKS_CONTEXTO    = 3
CACHE_MAX_ENTRADAS     = 200
CACHE_TTL_SEGUNDOS     = 3600
The cache system (LRU cache) stores up to 200 frequently-asked questions with a 1-hour TTL, providing responses in ~5ms vs 44s for uncached queries.

Document Converter Settings

The document converter (convertidor.py) uses these default paths:
DEFAULT_ORIGEN  = Path("/opt/siaa/instructivos")
DEFAULT_DEST_MD = Path("/opt/siaa/fuentes")
DEFAULT_DB      = Path("/opt/siaa/institucional.db")
DEFAULT_LOG     = Path("/opt/siaa/logs/conversion_errores.log")
TEMP_DIR        = Path("/tmp/siaa_temp")

PDF Converter Settings

The PDF converter (convertidor_pdf.py) configuration:
CARPETA_ENTRADA = "/opt/siaa/pdfs_origen"
CARPETA_SALIDA  = "/opt/siaa/fuentes/normativa"
MIN_CHARS = 200   # Threshold for OCR fallback
OCR_DPI   = 300
OCR_LANG  = "spa"

Verification

Verify your installation:
python3 --version
# Should show Python 3.8 or higher

Next Steps

Quickstart Guide

Learn how to start the system and make your first query

Document Conversion

Convert your institutional documents to Markdown

Troubleshooting

Check if Ollama is running:
systemctl status ollama
Start Ollama if needed:
systemctl start ollama
Or run manually:
ollama serve
Ensure LibreOffice headless is properly installed:
which libreoffice
libreoffice --headless --version
Test conversion manually:
libreoffice --headless --convert-to docx --outdir /tmp test.doc
Check directory permissions:
ls -la /opt/siaa
Fix permissions:
sudo chown -R $USER:$USER /opt/siaa
chmod -R 755 /opt/siaa
Use a virtual environment to isolate dependencies:
python3 -m venv ~/siaa-env
source ~/siaa-env/bin/activate
pip install Flask flask-cors pandas python-docx pymupdf4llm pdf2image pytesseract openpyxl xlrd requests

Build docs developers (and LLMs) love