Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dominikKos9/AgentForge/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through setting up AgentForge on your machine and generating your first AI-powered image description. By the end, you will have the Streamlit app running locally, understand how to interact with the UI, and know how to call the core run_agentforge function directly from Python.
AgentForge requires a Groq API key to power its language model. Create a free account at console.groq.com to obtain one before you begin.

Set up AgentForge

1

Prerequisites

Make sure Python is installed on your system. Python 3.9 or later is recommended.
python --version
2

Open the project

Clone or download the AgentForge repository, then open a terminal in the project root directory — the folder that contains requirements.txt and the AgentForge/ subdirectory.
3

Install dependencies

Install all required packages with pip:
pip install -r requirements.txt
This installs langchain, langgraph, streamlit, groq, pillow, edge-tts, transformers, torch, pydub, and python-dotenv.
4

Configure the .env file

Open the .env file located in the project directory and add your Groq API key:
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxx
Replace gsk_xxxxxxxxxxxxxxxxx with your actual key from the Groq console.
5

Run the app

Start the Streamlit frontend from the project root:
streamlit run frontend/app.py
6

Open in browser

Once the server starts, open your browser and navigate to:
http://localhost:8501
The AgentForge UI will load automatically.

Use the UI

The AgentForge interface is a single-page Streamlit app titled Glasovni opisivač za slijepe i slabovidne osobe (Voice describer for the blind and visually impaired).
  1. Upload an image — click the Upload button and select a .png, .jpg, or .jpeg file.
  2. Toggle detailed mode — check the Detaljan opis checkbox if you want a longer, more detailed description. Leave it unchecked for a concise summary.
  3. Generate a description — click the Generiraj opis button to run the agent pipeline.
The app returns:
  • A text description displayed on screen.
  • An audio file you can play directly in the browser.
Enable Detaljan opis (detailed mode) when you need a thorough scene analysis — for example, to capture fine details in medical or instructional images.
The bottom of the page shows the last five descriptions generated during your current session so you can review or replay previous results.

Call run_agentforge directly

You can invoke the core pipeline from Python without the Streamlit UI. The function signature from backend/main.py is:
def run_agentforge(image_path, session_id="default", detailed=False):
ParameterTypeDefaultDescription
image_pathstrPath to the image file on disk
session_idstr"default"Unique identifier for the session; used to scope history
detailedboolFalseSet to True to request a detailed description
Example:
from backend.main import run_agentforge

result = run_agentforge(
    image_path="data/my-session/photo.jpg",
    session_id="my-session",
    detailed=True,
)

print(result["description"])  # text description
print(result["audio_path"])   # path to generated audio file
The function returns the full LangGraph workflow state, which includes description, audio_path, valid_image, and error keys.

Build docs developers (and LLMs) love