CloudWatch Logs
All application logs are centralized in CloudWatch Logs with structured log groups.Log Groups
The infrastructure creates two primary log groups:Log group names follow the pattern
/ecs/{project_name}-{environment}-{service}. Replace prod with your environment name.Log Retention
Logs are retained for a configurable period (default: 30 days) to balance cost and compliance:infrastructure/terraform/modules/ecs/main.tf
RDS PostgreSQL Logs
Database logs are automatically exported to CloudWatch:infrastructure/terraform/modules/rds/main.tf
Viewing Logs
Using AWS Console
Using AWS Console
- Navigate to CloudWatch → Logs → Log groups
- Select the log group (e.g.,
/ecs/adma-prod-backend) - Click on a log stream to view entries
- Use Filter events to search for specific patterns
Using AWS CLI
Using AWS CLI
Log Stream Prefix
Each task creates a unique log stream:Container Insights
Container Insights provides enhanced metrics for ECS clusters, services, and tasks.Enable Container Insights
Container Insights is configured at the cluster level:infrastructure/terraform/modules/ecs/main.tf
Available Metrics
Container Insights automatically collects:- CPU utilization (cluster, service, task level)
- Memory utilization (cluster, service, task level)
- Network metrics (bytes in/out, packets)
- Task count (running, pending, desired)
- Disk I/O (read/write bytes)
Viewing Container Insights
- Navigate to CloudWatch → Container Insights
- Select your ECS cluster:
adma-prod-ecs - View metrics by:
- Cluster performance
- Service performance
- Task performance
Health Checks
Application Load Balancer Health Checks
The ALB continuously monitors target health:infrastructure/terraform/modules/ecs/main.tf
Health Check Parameters
Health Check Parameters
| Parameter | Value | Description |
|---|---|---|
| Path | / (frontend)/actuator/health (backend) | Endpoint to check |
| Interval | 30 seconds | Time between checks |
| Timeout | 6 seconds (frontend) 10 seconds (backend) | Max wait time |
| Healthy threshold | 2 consecutive successes | Mark as healthy |
| Unhealthy threshold | 3 consecutive failures | Mark as unhealthy |
| Matcher | 200-399 | Acceptable status codes |
ECS Container Health Checks
Each task definition includes container-level health checks:The
startPeriod gives the container time to initialize before health checks begin. Backend has a longer start period (60s) to allow for database migrations.Spring Boot Actuator Endpoint
The backend exposes a health endpoint at/actuator/health:
Response Example
Auto Scaling Metrics
ECS services automatically scale based on target tracking policies.Frontend Scaling
Backend Scaling
The backend uses identical scaling policies with separate target values.Key Metrics to Monitor
Service-Level Metrics
CPU & Memory Utilization
CPU & Memory Utilization
- Namespace:
AWS/ECS - Metrics:
CPUUtilization,MemoryUtilization - Dimensions:
ServiceName,ClusterName - Recommended threshold: < 80% sustained
Query Example
Target Health
Target Health
- Namespace:
AWS/ApplicationELB - Metrics:
HealthyHostCount,UnHealthyHostCount - Dimensions:
TargetGroup,LoadBalancer - Alert if:
UnHealthyHostCount > 0
Query Example
ALB Request Metrics
ALB Request Metrics
- Namespace:
AWS/ApplicationELB - Metrics:
RequestCount,TargetResponseTime,HTTPCode_Target_5XX_Count - Dimensions:
LoadBalancer - Alert if: 5XX errors > threshold OR response time > 3s
Query Example
RDS Performance
RDS Performance
- Namespace:
AWS/RDS - Metrics:
CPUUtilization,DatabaseConnections,FreeableMemory,ReadLatency,WriteLatency - Dimensions:
DBInstanceIdentifier - Alert if: CPU > 80%, connections > 80% of max
Query Example
Creating CloudWatch Alarms
Set up alarms to receive notifications when metrics exceed thresholds.Example: High Backend CPU
Example: ALB 5XX Errors
Example: RDS Connection Saturation
Application-Level Metrics
The URL shortener exposes custom metrics through the/api/stats endpoint.
Public Statistics Endpoint
Response
ShortUrlRepository.countByLinkStatus(LinkStatus.ACTIVE)ShortUrlRepository.sumAllClickCounts()- Welford’s algorithm for rolling average latency
For production monitoring, consider exporting these metrics to CloudWatch using the CloudWatch Agent or custom metric API calls.
Best Practices
Set Appropriate Retention Periods
Set Appropriate Retention Periods
Balance cost and compliance:
- Development: 7 days
- Staging: 14 days
- Production: 30-90 days
ecs_log_retention_days in your Terraform variables.Use Metric Filters for Custom Alerts
Use Metric Filters for Custom Alerts
Create metric filters to track specific log patterns:
Enable X-Ray for Distributed Tracing
Enable X-Ray for Distributed Tracing
For advanced tracing of requests across services, integrate AWS X-Ray by:
- Adding X-Ray SDK to the Spring Boot backend
- Enabling X-Ray in the task definition
- Updating IAM task role permissions
Monitor Scheduled Task Execution
Monitor Scheduled Task Execution
The backend runs a cleanup job every 15 minutes. Monitor its execution:
Next Steps
- Set up CI/CD pipelines for automated deployments
- Learn troubleshooting techniques for common issues
- Review security groups and secrets management for production environments