Quickstart

Overview

This guide will help you start SIAA, convert your first documents, and make your first intelligent query to the system.

Make sure you’ve completed the installation guide before proceeding.

Starting SIAA

Start Ollama Service

Start the Ollama service that powers the AI model:

ollama serve

This command runs in the foreground. Open a new terminal for the next steps, or run it in the background with nohup ollama serve > /tmp/ollama.log 2>&1 &

Verify the model is loaded:

curl http://localhost:11434/api/tags

You should see qwen2.5:3b in the response.

Start the Flask Proxy Server

Navigate to your SIAA directory and start the proxy server:

cd ~/siaa
python3 siaa_proxy.py

The server will start on port 5000 and you’ll see output like:

================================================================
  SIAA v2.1.25 — Sistema Inteligente de Apoyo Administrativo
  Seccional Bucaramanga — Rama Judicial
================================================================
  Ollama URL    : http://localhost:11434
  Modelo        : qwen2.5:3b
  Puerto        : 5000
  Hilos         : 16
================================================================

The proxy server must remain running. Consider using systemd, screen, or tmux for production deployments.

Verify System Status

Check the system status:

curl http://localhost:5000/siaa/status

Expected response:

{
  "status": "ok",
  "version": "2.1.25",
  "model": "qwen2.5:3b",
  "ollama_disponible": true,
  "documentos_cargados": 0,
  "cache": {
    "entradas": 0,
    "max": 200,
    "hits": 0,
    "misses": 0,
    "hit_rate": "0.0%",
    "ttl_seg": 3600
  }
}

Initially, documentos_cargados will be 0. You’ll add documents in the next step.

Converting Documents

Prepare Your Documents

Place your documents in the appropriate directories:For institutional documents (Word/Excel pairs):

mkdir -p /opt/siaa/instructivos/"Juzgado Civil Municipal"
# Copy .docx and .xlsx files to this folder

For PDF documents:

cp your-document.pdf /opt/siaa/pdfs_origen/

Each subfolder in /opt/siaa/instructivos should contain one Word document (.doc or .docx) and one Excel file (.xls or .xlsx).

Convert Institutional Documents

Run the main converter for Word/Excel documents:

cd ~/siaa
python3 convertidor.py

You’ll see output showing the conversion progress:

==========================================================
  SIAA — Convertidor Completo
  Origen : /opt/siaa/instructivos
  MD     : /opt/siaa/fuentes
  SQLite : /opt/siaa/institucional.db
==========================================================

  Procesando 1 carpeta(s):

  → Juzgado Civil Municipal (.docx) ... ✅ (1.2s)

=== Markdown generados ===
  juzgado_civil_municipal.md

=== Tablas SQLite ===
  juzgado_civil_municipal: 45 filas

=== Resumen ===
  Carpetas procesadas : 1
  Word/PDF → Markdown : 1 ✅  0 ❌
  Excel   → SQLite    : 1 ✅  0 ❌
  Carpetas con errores: 0

The converter automatically reloads the SIAA document index after successful conversion.

Convert PDF Documents

For PDF documents, use the specialized PDF converter:

python3 convertidor_pdf.py

The converter automatically detects if a PDF has native text or requires OCR:

=======================================================
  SIAA Convertidor PDF v2.0
  PDFs: 3 | Modo: Auto
  Salida: /opt/siaa/fuentes/normativa
=======================================================

  📂 acuerdo_psaa16-10476.pdf
    ✅ acuerdo_psaa16-10476.md → 24,679 chars [pymupdf4llm]

  📂 resolucion_escaneada.pdf
    ⚠ pymupdf extrajo 45 chars → OCR...
    📷 Convirtiendo a imágenes (DPI=300)...
    📄 5 página(s)
    🔍 OCR página 5/5...
    ✅ resolucion_escaneada.md → 8,234 chars [OCR Tesseract]

If pymupdf extracts less than 200 characters, the system automatically falls back to OCR using Tesseract.

Reload the Document Index

After adding documents, reload the SIAA index:

curl http://localhost:5000/siaa/recargar

Response:

{
  "status": "ok",
  "total_docs": 15,
  "colecciones": ["general", "normativa"]
}

Verify documents are loaded:

curl http://localhost:5000/siaa/status

Using the Web Interface

Open the Web Interface

Open your browser and navigate to:

http://localhost:5000

Or if accessing from another machine:

http://YOUR_SERVER_IP:5000

The web interface is located at Web/index.html and is served automatically by the Flask proxy.

Make Your First Query

Try asking a question about your documents:Example queries:

“¿Qué es el SIERJU?”
“¿Cuándo debo reportar la información?”
“¿Quién es responsable de cargar los datos?”
“¿Qué sanciones hay por incumplimiento?”

The system will:

Classify your question (conversational or document-based)
Find relevant documents using the TF-IDF index
Extract relevant chunks with overlap
Generate a response using the Qwen2.5:3b model
Provide source citations

Making API Queries

You can also query SIAA via the REST API:

curl -X POST http://localhost:5000/siaa/consulta \
  -H "Content-Type: application/json" \
  -d '{"pregunta": "¿Qué es el SIERJU?"}'

The API streams responses using Server-Sent Events (SSE) for real-time output. Each chunk is prefixed with data: .

Monitoring System Health

Check System Status

curl http://localhost:5000/siaa/status

Key metrics:

documentos_cargados: Number of indexed documents
cache.hit_rate: Percentage of queries served from cache
ollama_disponible: Ollama service availability

View Quality Logs

SIAA logs all queries to /opt/siaa/logs/calidad.jsonl:

# View last 10 queries
tail -10 /opt/siaa/logs/calidad.jsonl | jq

# Get cache hit queries
grep 'CACHE_HIT' /opt/siaa/logs/calidad.jsonl

# Find slow queries (>30 seconds)
jq 'select(.tiempo_s > 30)' /opt/siaa/logs/calidad.jsonl

The log file is in JSONL format (one JSON object per line) for easy analysis with jq, grep, or Python.

Access Logs via API

# Last 50 entries (default)
curl http://localhost:5000/siaa/log

# Last 100 entries
curl http://localhost:5000/siaa/log?n=100

# Only errors
curl http://localhost:5000/siaa/log?tipo=ERROR

Performance Tips

Cache Efficiency

The LRU cache stores up to 200 queries for 1 hour. Repeated queries return in ~5ms vs 44s.Monitor cache hit rate: curl http://localhost:5000/siaa/status | jq .cache.hit_rate

Document Chunking

Documents are split into 800-character chunks with 300-character overlap, ensuring context preservation.Configured in siaa_proxy.py:294-296

Concurrent Queries

The system supports up to 2 concurrent Ollama queries with 16 server threads.Configured in siaa_proxy.py:277-278

Model Warm-up

The Qwen2.5:3b model is preloaded into RAM on startup for faster first query.See warm-up logic in siaa_proxy.py:509-529

Advanced Configuration

Convert Specific Folders Only

python3 convertidor.py --only-folder "Juzgado Civil Municipal"

Force OCR for All PDFs

python3 convertidor_pdf.py --forzar-ocr

Reconvert Only Empty/Failed Documents

python3 convertidor_pdf.py --reconvertir

Custom Paths

python3 convertidor.py \
  --origen /custom/input \
  --dest-md /custom/output \
  --db /custom/database.db \
  --log /custom/logs/errors.log

Troubleshooting

No documents loaded

Check if documents were converted correctly:

ls -la /opt/siaa/fuentes/

Reload the index:

curl http://localhost:5000/siaa/recargar

Slow query responses

Check Ollama status:

curl http://localhost:11434/api/tags

Verify the model is loaded:

ollama list | grep qwen2.5:3b

Monitor system resources:

htop

Qwen2.5:3b requires approximately 2-4GB RAM.

Document conversion fails

Check conversion logs:

cat /opt/siaa/logs/conversion_errores.log

Common issues:

LibreOffice not installed: which libreoffice
Permission errors: sudo chown -R $USER /opt/siaa
Corrupted documents: Try manual conversion

Cache not working

Verify cache configuration in siaa_proxy.py:61-63:

CACHE_MAX_ENTRADAS = 200
CACHE_TTL_SEGUNDOS = 3600
CACHE_SOLO_DOC     = True

Check cache stats:

curl http://localhost:5000/siaa/status | jq .cache

Next Steps

API Reference

Explore all available API endpoints

Document Management

Learn about document organization and indexing

System Architecture

Understand how SIAA works internally

Administration

Configure and monitor your SIAA instance

Get Started

Core Features

Document Processing

System Architecture

Administration

Overview

Starting SIAA

Converting Documents

Using the Web Interface

Making API Queries

Monitoring System Health

Check System Status

View Quality Logs

Access Logs via API

Performance Tips

Cache Efficiency

Document Chunking

Concurrent Queries

Model Warm-up

Advanced Configuration

Convert Specific Folders Only

Force OCR for All PDFs

Reconvert Only Empty/Failed Documents

Custom Paths

Troubleshooting

Next Steps

API Reference

Document Management

System Architecture

Administration

Build docs developers (and LLMs) love

Get Started

Core Features

Document Processing

System Architecture

Administration

​Overview

​Starting SIAA

​Converting Documents

​Using the Web Interface

​Making API Queries

​Monitoring System Health

​Check System Status

​View Quality Logs

​Access Logs via API

​Performance Tips

Cache Efficiency

Document Chunking

Concurrent Queries

Model Warm-up

​Advanced Configuration

​Convert Specific Folders Only

​Force OCR for All PDFs

​Reconvert Only Empty/Failed Documents

​Custom Paths

​Troubleshooting

​Next Steps

API Reference

Document Management

System Architecture

Administration

Build docs developers (and LLMs) love

Overview

Starting SIAA

Converting Documents

Using the Web Interface

Making API Queries

Monitoring System Health

Check System Status

View Quality Logs

Access Logs via API

Performance Tips

Advanced Configuration

Convert Specific Folders Only

Force OCR for All PDFs

Reconvert Only Empty/Failed Documents

Custom Paths

Troubleshooting

Next Steps