Skip to main content

Overview

This guide will help you start SIAA, convert your first documents, and make your first intelligent query to the system.
Make sure you’ve completed the installation guide before proceeding.

Starting SIAA

1

Start Ollama Service

Start the Ollama service that powers the AI model:
ollama serve
This command runs in the foreground. Open a new terminal for the next steps, or run it in the background with nohup ollama serve > /tmp/ollama.log 2>&1 &
Verify the model is loaded:
curl http://localhost:11434/api/tags
You should see qwen2.5:3b in the response.
2

Start the Flask Proxy Server

Navigate to your SIAA directory and start the proxy server:
cd ~/siaa
python3 siaa_proxy.py
The server will start on port 5000 and you’ll see output like:
================================================================
  SIAA v2.1.25 — Sistema Inteligente de Apoyo Administrativo
  Seccional Bucaramanga — Rama Judicial
================================================================
  Ollama URL    : http://localhost:11434
  Modelo        : qwen2.5:3b
  Puerto        : 5000
  Hilos         : 16
================================================================
The proxy server must remain running. Consider using systemd, screen, or tmux for production deployments.
3

Verify System Status

Check the system status:
curl http://localhost:5000/siaa/status
Expected response:
{
  "status": "ok",
  "version": "2.1.25",
  "model": "qwen2.5:3b",
  "ollama_disponible": true,
  "documentos_cargados": 0,
  "cache": {
    "entradas": 0,
    "max": 200,
    "hits": 0,
    "misses": 0,
    "hit_rate": "0.0%",
    "ttl_seg": 3600
  }
}
Initially, documentos_cargados will be 0. You’ll add documents in the next step.

Converting Documents

1

Prepare Your Documents

Place your documents in the appropriate directories:For institutional documents (Word/Excel pairs):
mkdir -p /opt/siaa/instructivos/"Juzgado Civil Municipal"
# Copy .docx and .xlsx files to this folder
For PDF documents:
cp your-document.pdf /opt/siaa/pdfs_origen/
Each subfolder in /opt/siaa/instructivos should contain one Word document (.doc or .docx) and one Excel file (.xls or .xlsx).
2

Convert Institutional Documents

Run the main converter for Word/Excel documents:
cd ~/siaa
python3 convertidor.py
You’ll see output showing the conversion progress:
==========================================================
  SIAA — Convertidor Completo
  Origen : /opt/siaa/instructivos
  MD     : /opt/siaa/fuentes
  SQLite : /opt/siaa/institucional.db
==========================================================

  Procesando 1 carpeta(s):

  → Juzgado Civil Municipal (.docx) ... ✅ (1.2s)

=== Markdown generados ===
  juzgado_civil_municipal.md

=== Tablas SQLite ===
  juzgado_civil_municipal: 45 filas

=== Resumen ===
  Carpetas procesadas : 1
  Word/PDF → Markdown : 1 ✅  0 ❌
  Excel   → SQLite    : 1 ✅  0 ❌
  Carpetas con errores: 0
The converter automatically reloads the SIAA document index after successful conversion.
3

Convert PDF Documents

For PDF documents, use the specialized PDF converter:
python3 convertidor_pdf.py
The converter automatically detects if a PDF has native text or requires OCR:
=======================================================
  SIAA Convertidor PDF v2.0
  PDFs: 3 | Modo: Auto
  Salida: /opt/siaa/fuentes/normativa
=======================================================

  📂 acuerdo_psaa16-10476.pdf
    ✅ acuerdo_psaa16-10476.md → 24,679 chars [pymupdf4llm]

  📂 resolucion_escaneada.pdf
    ⚠ pymupdf extrajo 45 chars → OCR...
    📷 Convirtiendo a imágenes (DPI=300)...
    📄 5 página(s)
    🔍 OCR página 5/5...
    ✅ resolucion_escaneada.md → 8,234 chars [OCR Tesseract]
If pymupdf extracts less than 200 characters, the system automatically falls back to OCR using Tesseract.
4

Reload the Document Index

After adding documents, reload the SIAA index:
curl http://localhost:5000/siaa/recargar
Response:
{
  "status": "ok",
  "total_docs": 15,
  "colecciones": ["general", "normativa"]
}
Verify documents are loaded:
curl http://localhost:5000/siaa/status

Using the Web Interface

1

Open the Web Interface

Open your browser and navigate to:
http://localhost:5000
Or if accessing from another machine:
http://YOUR_SERVER_IP:5000
The web interface is located at Web/index.html and is served automatically by the Flask proxy.
2

Make Your First Query

Try asking a question about your documents:Example queries:
  • “¿Qué es el SIERJU?”
  • “¿Cuándo debo reportar la información?”
  • “¿Quién es responsable de cargar los datos?”
  • “¿Qué sanciones hay por incumplimiento?”
The system will:
  1. Classify your question (conversational or document-based)
  2. Find relevant documents using the TF-IDF index
  3. Extract relevant chunks with overlap
  4. Generate a response using the Qwen2.5:3b model
  5. Provide source citations

Making API Queries

You can also query SIAA via the REST API:
curl -X POST http://localhost:5000/siaa/consulta \
  -H "Content-Type: application/json" \
  -d '{"pregunta": "¿Qué es el SIERJU?"}'
The API streams responses using Server-Sent Events (SSE) for real-time output. Each chunk is prefixed with data: .

Monitoring System Health

Check System Status

curl http://localhost:5000/siaa/status
Key metrics:
  • documentos_cargados: Number of indexed documents
  • cache.hit_rate: Percentage of queries served from cache
  • ollama_disponible: Ollama service availability

View Quality Logs

SIAA logs all queries to /opt/siaa/logs/calidad.jsonl:
# View last 10 queries
tail -10 /opt/siaa/logs/calidad.jsonl | jq

# Get cache hit queries
grep 'CACHE_HIT' /opt/siaa/logs/calidad.jsonl

# Find slow queries (>30 seconds)
jq 'select(.tiempo_s > 30)' /opt/siaa/logs/calidad.jsonl
The log file is in JSONL format (one JSON object per line) for easy analysis with jq, grep, or Python.

Access Logs via API

# Last 50 entries (default)
curl http://localhost:5000/siaa/log

# Last 100 entries
curl http://localhost:5000/siaa/log?n=100

# Only errors
curl http://localhost:5000/siaa/log?tipo=ERROR

Performance Tips

Cache Efficiency

The LRU cache stores up to 200 queries for 1 hour. Repeated queries return in ~5ms vs 44s.Monitor cache hit rate: curl http://localhost:5000/siaa/status | jq .cache.hit_rate

Document Chunking

Documents are split into 800-character chunks with 300-character overlap, ensuring context preservation.Configured in siaa_proxy.py:294-296

Concurrent Queries

The system supports up to 2 concurrent Ollama queries with 16 server threads.Configured in siaa_proxy.py:277-278

Model Warm-up

The Qwen2.5:3b model is preloaded into RAM on startup for faster first query.See warm-up logic in siaa_proxy.py:509-529

Advanced Configuration

Convert Specific Folders Only

python3 convertidor.py --only-folder "Juzgado Civil Municipal"

Force OCR for All PDFs

python3 convertidor_pdf.py --forzar-ocr

Reconvert Only Empty/Failed Documents

python3 convertidor_pdf.py --reconvertir

Custom Paths

python3 convertidor.py \
  --origen /custom/input \
  --dest-md /custom/output \
  --db /custom/database.db \
  --log /custom/logs/errors.log

Troubleshooting

Check if documents were converted correctly:
ls -la /opt/siaa/fuentes/
Reload the index:
curl http://localhost:5000/siaa/recargar
Check Ollama status:
curl http://localhost:11434/api/tags
Verify the model is loaded:
ollama list | grep qwen2.5:3b
Monitor system resources:
htop
Qwen2.5:3b requires approximately 2-4GB RAM.
Check conversion logs:
cat /opt/siaa/logs/conversion_errores.log
Common issues:
  • LibreOffice not installed: which libreoffice
  • Permission errors: sudo chown -R $USER /opt/siaa
  • Corrupted documents: Try manual conversion
Verify cache configuration in siaa_proxy.py:61-63:
CACHE_MAX_ENTRADAS = 200
CACHE_TTL_SEGUNDOS = 3600
CACHE_SOLO_DOC     = True
Check cache stats:
curl http://localhost:5000/siaa/status | jq .cache

Next Steps

API Reference

Explore all available API endpoints

Document Management

Learn about document organization and indexing

System Architecture

Understand how SIAA works internally

Administration

Configure and monitor your SIAA instance

Build docs developers (and LLMs) love