Get Started with the DevOps Root Cause Analysis Agent

This guide walks you through standing up a fully functional instance of the RCA Agent on your local machine. By the end you will have a running Celery worker, a Redis broker, and a Streamlit UI ready to accept your first incident analysis. The entire process takes under 10 minutes on a machine with Python 3.9+ and Docker available.

Confirm Prerequisites

Before you begin, make sure the following are available on your machine:

Python 3.9 or later — the agent uses match statements and modern type hints.
Docker — the quickest way to run Redis without a local installation. Alternatively, install Redis 7+ natively.
An LLM API key — the agent’s hypothesis generation stage requires access to a large language model. OpenAI (gpt-4o or gpt-4-turbo) is the default provider; see Environment Configuration for alternative providers.
Git — to clone the repository.

Verify your Python version:

python --version
# Python 3.9.x or later

Clone the Repository

Clone the project from GitHub and move into the project directory:

git clone https://github.com/vrashmanyu605-eng/devops-root-cause-analysis-agent.git
cd devops-root-cause-analysis-agent

The repository root contains the app/ package (agent core, Celery worker, and Streamlit UI), a requirements.txt, and an .env.example template you will use in a later step.

Create and Activate a Virtual Environment

Isolate the project’s dependencies in a dedicated virtual environment to avoid conflicts with other Python projects on your machine:

python -m venv .venv
source .venv/bin/activate

On Windows (PowerShell):

python -m venv .venv
.venv\Scripts\activate

Your shell prompt should now be prefixed with (.venv), confirming the environment is active.

Install Dependencies

Install all required Python packages from the lockfile:

pip install -r requirements.txt

This installs the core agent libraries along with streamlit, celery, redis, the OpenAI client, and all supporting packages. Depending on your network speed this typically takes 60–90 seconds.

Configure Environment Variables

The agent is configured entirely through environment variables. Copy the provided example file and fill in your values:

cp .env.example .env

Open .env in your editor and set the required variables:

# .env

# LLM provider — required for hypothesis generation
OPENAI_API_KEY=sk-...

# Redis connection — must match where Redis is running
REDIS_URL=redis://localhost:6379/0

# Celery broker and result backend (both point to Redis by default)
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

# Optional: tune the LLM model used for root cause analysis
RCA_LLM_MODEL=gpt-4o

# Optional: maximum number of log lines passed to the LLM context window
RCA_MAX_LOG_LINES=500

Never commit your .env file to version control. It is already listed in .gitignore, but double-check before pushing to a shared repository.

Start Redis

The Celery worker requires Redis as its message broker and result backend. The fastest way to start Redis locally is with Docker:

docker run -d -p 6379:6379 redis:7-alpine

Confirm Redis is accepting connections:

docker exec $(docker ps -qf "ancestor=redis:7-alpine") redis-cli ping
# PONG

If you prefer a native Redis installation, start the server with redis-server and confirm it is listening on port 6379.

Start the Celery Worker

Open a new terminal window, activate the virtual environment, and start the Celery worker process. The worker picks up analysis tasks submitted by the Streamlit UI and executes each pipeline stage:

source .venv/bin/activate  # or .venv\Scripts\activate on Windows
celery -A app.worker worker --loglevel=info

You should see output similar to:

[config]
.> app:         rca_agent
.> transport:   redis://localhost:6379/0
.> results:     redis://localhost:6379/1
.> concurrency: 4 (prefork)

[queues]
.> celery    exchange=celery(direct) key=celery

[tasks]
. app.worker.tasks.ingest_signals
. app.worker.tasks.preprocess_signals
. app.worker.tasks.generate_hypotheses
. app.worker.tasks.rank_and_format

Ready.

Leave this terminal running throughout your session.

Launch the Streamlit UI

In a third terminal window, activate the virtual environment and launch the Streamlit application:

source .venv/bin/activate  # or .venv\Scripts\activate on Windows
streamlit run app/ui.py

Streamlit will print the local URL:

You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.x.x:8501

Trigger Your First Analysis

Open http://localhost:8501 in your browser. You will see the RCA Agent dashboard with an incident configuration form.Fill in the incident context fields:

Service name — the affected service or component (e.g., payment-api)
Time window — the start and end timestamps covering the incident (e.g., the 30 minutes around when the alert fired)
Alert description — a brief description of the observed symptom (e.g., P99 latency exceeded 2000ms, error rate spiked to 12%)

Click Run Analysis. The UI will display a progress indicator as the Celery worker ingests signals, preprocesses them, and runs the LLM reasoning chain. When the pipeline completes, the Ranked Hypotheses panel populates with candidate root causes, each showing a confidence score and expandable evidence excerpts.

On the first run with no connected data sources, the agent operates in demo mode and uses bundled sample signals so you can explore the output format before wiring up live observability backends.

For a deeper walkthrough — including how to interpret confidence scores, export findings for a postmortem, and connect live observability data sources — see the Running an Analysis guide.

Get Started

Configuration

Guides

Reference

Get Started with the DevOps Root Cause Analysis Agent

Build docs developers (and LLMs) love

Get Started

Configuration

Guides

Reference

Documentation Index

Build docs developers (and LLMs) love