Quickstart: run AgentForge locally in five minutes

This guide walks you through setting up AgentForge on your machine and generating your first AI-powered image description. By the end, you will have the Streamlit app running locally, understand how to interact with the UI, and know how to call the core run_agentforge function directly from Python.

AgentForge requires a Groq API key to power its language model. Create a free account at console.groq.com to obtain one before you begin.

Set up AgentForge

Prerequisites

Make sure Python is installed on your system. Python 3.9 or later is recommended.

python --version

Open the project

Clone or download the AgentForge repository, then open a terminal in the project root directory — the folder that contains requirements.txt and the AgentForge/ subdirectory.

Install dependencies

Install all required packages with pip:

pip install -r requirements.txt

This installs langchain, langgraph, streamlit, groq, pillow, edge-tts, transformers, torch, pydub, and python-dotenv.

Configure the .env file

Open the .env file located in the project directory and add your Groq API key:

GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxx

Replace gsk_xxxxxxxxxxxxxxxxx with your actual key from the Groq console.

Run the app

Start the Streamlit frontend from the project root:

streamlit run frontend/app.py

Open in browser

Once the server starts, open your browser and navigate to:

http://localhost:8501

The AgentForge UI will load automatically.

Use the UI

The AgentForge interface is a single-page Streamlit app titled Glasovni opisivač za slijepe i slabovidne osobe (Voice describer for the blind and visually impaired).

Upload an image — click the Upload button and select a .png, .jpg, or .jpeg file.
Toggle detailed mode — check the Detaljan opis checkbox if you want a longer, more detailed description. Leave it unchecked for a concise summary.
Generate a description — click the Generiraj opis button to run the agent pipeline.

The app returns:

A text description displayed on screen.
An audio file you can play directly in the browser.

Enable Detaljan opis (detailed mode) when you need a thorough scene analysis — for example, to capture fine details in medical or instructional images.

The bottom of the page shows the last five descriptions generated during your current session so you can review or replay previous results.

Call `run_agentforge` directly

You can invoke the core pipeline from Python without the Streamlit UI. The function signature from backend/main.py is:

def run_agentforge(image_path, session_id="default", detailed=False):

Parameter	Type	Default	Description
`image_path`	`str`	—	Path to the image file on disk
`session_id`	`str`	`"default"`	Unique identifier for the session; used to scope history
`detailed`	`bool`	`False`	Set to `True` to request a detailed description

Example:

from backend.main import run_agentforge

result = run_agentforge(
    image_path="data/my-session/photo.jpg",
    session_id="my-session",
    detailed=True,
)

print(result["description"])  # text description
print(result["audio_path"])   # path to generated audio file

The function returns the full LangGraph workflow state, which includes description, audio_path, valid_image, and error keys.

Get Started

Architecture

Agents & Tools

Configuration

Quickstart: run AgentForge locally in five minutes

Set up AgentForge

Use the UI

Call `run_agentforge` directly

Build docs developers (and LLMs) love

Get Started

Architecture

Agents & Tools

Configuration

Documentation Index

​Set up AgentForge

​Use the UI

​Call run_agentforge directly

Build docs developers (and LLMs) love

Set up AgentForge

Use the UI

Call `run_agentforge` directly