Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vrashmanyu605-eng/devops-root-cause-analysis-agent/llms.txt

Use this file to discover all available pages before exploring further.

When something goes wrong, the most effective first step is to check three sources in order: the service logs (Celery worker stdout, Streamlit stderr), the connector health endpoints (which surface data source reachability independently of a full analysis run), and the Celery task state stored in Redis. Most failures fall into one of three categories — startup problems, analysis-time failures, or connectivity issues — and the sections below address each with targeted diagnostic commands and configuration fixes.
The most common cause is a missing or malformed REDIS_URL environment variable. The worker process exits immediately if it cannot connect to the Redis broker during startup.DiagnosisFirst, confirm that Redis is reachable from the worker container or host:
redis-cli -u "$REDIS_URL" ping
# Expected output: PONG
Then verify the worker can register itself with the broker:
celery -A app.worker inspect ping
# Expected output: {"celery@<hostname>": {"ok": "pong"}}
If inspect ping times out with no response, the worker process has not started or cannot reach the broker.Common causes and fixes
  • REDIS_URL is not set — add it to your .env file: REDIS_URL=redis://localhost:6379/0
  • Redis is not running — start it with docker compose up redis or redis-server
  • Firewall or network policy blocking port 6379 between the worker and Redis containers
  • CELERY_BROKER_URL is set to a different value than REDIS_URL — ensure both point to the same Redis instance
Never omit the database index suffix (e.g., /0) from REDIS_URL. Some Redis client libraries treat a bare redis://host:port URL differently from redis://host:port/0, which can cause the broker and result backend to use different databases and break task result retrieval.
A blank page at http://localhost:8501 usually means Streamlit started but encountered an import or runtime error before rendering, or the browser is connecting before the server is ready.DiagnosisCheck the terminal output where streamlit run was launched for Python tracebacks. Streamlit prints errors to stderr before the server begins accepting connections:
streamlit run app/ui.py --server.port 8501 2>&1 | head -50
Verify that port 8501 is not already bound by another process:
lsof -i :8501
# Should return no output if the port is free
Fixes
  • If a traceback is present, resolve the underlying Python error (see the ImportError accordion below).
  • If the port is in use, stop the conflicting process or change the port: streamlit run app/ui.py --server.port 8502
  • Clear Streamlit’s module and data cache if a stale cache is causing unexpected state:
streamlit cache clear
  • On some Linux hosts, Streamlit’s file watcher uses inotify. If you see “inotify watch limit reached”, increase the system limit:
echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
This error means Python cannot locate one or more packages that the agent depends on. It most often occurs when the virtual environment is not activated or dependencies were never installed.Diagnosis
# Confirm the active Python environment
which python
pip list | grep -E "celery|streamlit|openai|redis"
If the grep returns no output, the packages are missing from the current environment.FixesActivate the virtual environment and install dependencies:
source .venv/bin/activate          # Linux / macOS
# .venv\Scripts\activate           # Windows

pip install -r requirements.txt
If you are using Docker and see this error inside the container, the image may have been built before requirements.txt was last updated. Rebuild without cache:
docker compose build --no-cache worker app
Always pin your dependency versions in requirements.txt (e.g., celery==5.3.6) to avoid silent breakage when upstream packages release incompatible updates.

Enabling Debug Logging

Detailed debug output is the fastest way to understand exactly what the agent is doing at each pipeline stage. Enable it for the Celery worker and the Streamlit UI independently. Celery worker with debug logging:
LOG_LEVEL=DEBUG celery -A app.worker worker --loglevel=debug
Streamlit UI with debug logging:
LOG_LEVEL=DEBUG streamlit run app/ui.py
At DEBUG level, the worker logs the full signal payload before it is sent to the LLM, the rendered prompt template, and the raw LLM JSON response. This makes it straightforward to verify that the correct signals are being fetched and that the LLM prompt is populated as expected.
Pipe debug output to a file for easier analysis: LOG_LEVEL=DEBUG celery -A app.worker worker --loglevel=debug 2>&1 | tee worker-debug.log. You can then grep for specific task IDs or connector names without scrolling through the live terminal.
If none of the steps above resolve your issue, open a GitHub issue in the repository and attach the debug log output (redact any API keys or secrets). Include the analysis ID, the list of enabled connectors, and the environment variable names (not values) that are set in your .env file.

Build docs developers (and LLMs) love