Finding errors during an outage
Start by pulling the most recent errors and fatals to see what the system was reporting at the time.SHOW COUNT first to understand scale. If you’re seeing thousands of errors, SHOW LAST 50 narrows it to the tail end of the incident.
Temporal correlation — were there warnings before errors?
The most common root-cause pattern: a warning fires, goes unnoticed, and errors follow seconds later.WITHIN...OF surfaces this connection directly.
5s for fast services, 2m for slower background jobs.
Tracing a single request
Once you have a suspectrequest_id from an error entry, pull every log line associated with it to reconstruct the full request lifecycle.
request_id with whatever correlation field your service uses — trace_id, correlation_id, job_id, and so on.
Find all 5xx errors grouped by endpoint
When errors are widespread, grouping by endpoint tells you which paths are failing.path value alongside its matching entries, making it straightforward to identify the most-affected endpoints.
Database errors near high latency
Database problems often manifest as latency spikes before explicit error messages appear. Use temporal correlation to connect the two signals.latency_ms >= 1000. Adjust the threshold and window to match your SLOs.
Deployment-related issues
If errors spiked after a deploy, confirm the timing with a temporal query.Temporal correlation requires a parseable timestamp field. Zeal recognises
timestamp, ts, time, and @timestamp automatically.