Monitoring, logging, and health checks

Módulo Horario exposes health check endpoints on every service and emits structured JSON logs suitable for ingestion by ELK Stack or CloudWatch. This page covers how to verify service health, configure centralized logging, add Prometheus monitoring, and troubleshoot the most common operational issues.

Health check endpoints

Every service exposes GET /health with no authentication required. Use these endpoints for load balancer health checks, uptime monitors, and post-deployment verification.

# usuarios-service
curl -s http://localhost:8001/health

# materias-service
curl -s http://localhost:8002/health

# aulas-service
curl -s http://localhost:8003/health

# horario-service
curl -s http://localhost:8004/health

A healthy service returns HTTP 200 with a JSON body:

{"status": "ok"}

You can also verify health through the Nginx proxy after SSL is configured:

curl -s https://api.yourdomain.com/health

Docker Compose uses these endpoints in its built-in health check configuration. To add them to your Dockerfile:

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8004/health || exit 1

Structured JSON logging

Configure each microservice to emit logs as JSON objects. This format is required for ELK Stack ingestion and makes log queries in Kibana or CloudWatch Logs Insights straightforward. Add the following setup to each service’s application entry point:

import logging
import logging.handlers
import json

logger = logging.getLogger(__name__)
handler = logging.StreamHandler()

formatter = logging.Formatter(
    json.dumps({
        'timestamp': '%(asctime)s',
        'level': '%(levelname)s',
        'service': 'horario-service',
        'message': '%(message)s'
    })
)
handler.setFormatter(formatter)
logger.addHandler(handler)

When logging application events, include structured context — never raw interpolated strings:

@app.post("/horarios")
def crear_horario(payload, current_user):
    logger.info("Creating schedule", extra={
        'user_id': current_user.id,
        'docente_id': payload.docente_id
    })

Log rotation

For file-based logging (useful when not shipping logs to a collector), use RotatingFileHandler to cap disk usage:

handler = logging.handlers.RotatingFileHandler(
    '/var/log/microservicio/app.log',
    maxBytes=10485760,   # 10 MB per file
    backupCount=10       # Keep 10 rotated files
)

Docker Compose already applies log rotation to container stdout via its json-file driver. Add the following to any service in docker-compose.yml:

horario-service:
  logging:
    driver: json-file
    options:
      max-size: "10m"
      max-file: "3"

ELK Stack integration

To centralize logs from all four services, add Elasticsearch and Kibana to your Docker Compose setup.

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0
    environment:
      - discovery.type=single-node
    ports:
      - "9200:9200"

  kibana:
    image: docker.elastic.co/kibana/kibana:8.0.0
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch

  horario-service:
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

After starting, access Kibana at http://localhost:5601. Create an index pattern matching logstash-* or configure Filebeat to ship the Docker JSON logs to Elasticsearch.

Prometheus monitoring

Add a Prometheus container to scrape metrics from each service. Create prometheus.yml alongside docker-compose.yml:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'horario-service'
    static_configs:
      - targets: ['horario-service:8004']
  - job_name: 'usuarios-service'
    static_configs:
      - targets: ['usuarios-service:8001']
  - job_name: 'materias-service'
    static_configs:
      - targets: ['materias-service:8002']
  - job_name: 'aulas-service'
    static_configs:
      - targets: ['aulas-service:8003']

Then add the Prometheus service to docker-compose.yml:

services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

Prometheus metrics scraping requires instrumenting each service with the prometheus-fastapi-instrumentator or similar library. The endpoints are not exposed by default. This is a roadmap item for Phase 3.

What to monitor

System metrics

CPU usage, memory consumption, and disk I/O per container. Alert when CPU exceeds 80% or memory exceeds 90% of the container limit.

Request latency

P50, P95, and P99 response times for each service. The /horarios endpoint performs upstream calls to three services — latency spikes there often indicate a dependency issue.

Failed login attempts

Count and rate of 401 responses from POST /auth/login. A spike indicates a credential-stuffing or brute-force attempt. Rate limiting should engage at 5 req/min per IP.

Rate limit hits

Count of 429 responses from Nginx. A sustained high rate indicates traffic above expected levels and may require scaling or IP-level blocking.

Troubleshooting

Service not responding / connection refused

Check whether all containers are running and healthy.

docker compose ps

If a service shows as unhealthy or exited, inspect its logs:

docker compose logs horario-service | tail -100

Restart the affected service without restarting others:

docker compose restart horario-service

Database connection error

Verify the database container is healthy and accepting connections.

# PostgreSQL
docker compose exec postgres pg_isready -U postgres

# MySQL
docker compose exec mysql mysqladmin ping -h localhost -uroot -p

Check that migrations are current inside the service container:

docker compose exec horario-service alembic current
docker compose exec horario-service alembic upgrade head

Confirm DATABASE_URL points to the container hostname (e.g., postgres or mysql) rather than localhost when running inside Docker.

JWT invalid or 401 on all requests

This usually means JWT_SECRET is inconsistent between services or has been rotated without restarting all services.Regenerate a new secret and update it in your secrets store:

openssl rand -hex 32

Then restart all services to pick up the new value:

docker compose restart

Confirm the new secret is injected:

docker compose exec usuarios-service printenv JWT_SECRET

Rate limiting blocking legitimate traffic

If 429 responses appear for non-abusive traffic, check the Nginx rate limit configuration. The login zone is set to 5 req/min and the API zone to 100 req/min. If your use case requires higher limits, increase the rate= value in nginx.conf and reload:

sudo nginx -t
sudo systemctl reload nginx

Get Started

Core Features

Deployment & Operations

Security

Monitoring, logging, and health checks

Health check endpoints

Structured JSON logging

Log rotation

ELK Stack integration

Prometheus monitoring

What to monitor

System metrics

Request latency

Failed login attempts

Rate limit hits

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Core Features

Deployment & Operations

Security

Documentation Index

​Health check endpoints

​Structured JSON logging

​Log rotation

​ELK Stack integration

​Prometheus monitoring

​What to monitor

System metrics

Request latency

Failed login attempts

Rate limit hits

​Troubleshooting

Build docs developers (and LLMs) love

Health check endpoints

Structured JSON logging

Log rotation

ELK Stack integration

Prometheus monitoring

What to monitor

Troubleshooting