Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dominikKos9/AgentForge/llms.txt

Use this file to discover all available pages before exploring further.

AgentForge uses LangGraph’s StateGraph to define a deterministic pipeline where each node is a Python function that receives the current AgentState and returns an updated copy. The graph is compiled once at startup and invoked for every image processing request, giving you a reusable, inspectable execution graph with built-in conditional branching.

Complete workflow definition

The entire graph is defined in workflow.py. It registers three nodes, sets up one unconditional starting edge, one conditional branch from the orchestrator, and two unconditional edges completing the happy path.
workflow.py
from langgraph.graph import StateGraph, START, END
from backend.graph.state import AgentState

from backend.agents.orchestrator_agent import orchestrator_agent
from backend.agents.visual_agent import visual_analysis_agent
from backend.agents.speech_agent import speech_agent
from backend.tools.mcp_tools import describe_image_tool


builder = StateGraph(AgentState)


builder.add_node("orchestrator", orchestrator_agent)


def vision_node(state):
    return describe_image_tool(visual_analysis_agent, state)


builder.add_node("vision", vision_node)
builder.add_node("speech", speech_agent)


builder.add_edge(START, "orchestrator")


def route(state):
    if state.get("valid_image") is False:
        return END
    return "vision"


builder.add_conditional_edges("orchestrator", route)

builder.add_edge("vision", "speech")
builder.add_edge("speech", END)


workflow = builder.compile()

Graph nodes

orchestrator — runs orchestrator_agent. This node validates the image file, generates its SHA-256 hash, and checks the session cache. It sets valid_image to True or False and may also populate description and audio_path from a cache hit, in which case the vision and speech nodes are skipped entirely because the route function sends execution to END. vision — runs vision_node, which delegates to describe_image_tool(visual_analysis_agent, state). The MCP-style wrapper calls the BLIP captioning model and then the Groq LLM to produce a Croatian text description. The result is written to state["description"]. speech — runs speech_agent. It reads state["description"], calls Edge TTS with the hr-HR-GabrijelaNeural voice, and writes the MP3 output path to state["audio_path"].

Conditional routing

After the orchestrator node completes, LangGraph evaluates the route function to decide the next node.
workflow.py
def route(state):
    if state.get("valid_image") is False:
        return END
    return "vision"
The function returns either the string "vision" or the END sentinel. LangGraph maps these return values to edges registered with add_conditional_edges. If valid_image has any value other than the boolean False — including None or True — the graph proceeds to "vision".

Edge map

FromConditionTo
STARTorchestrator
orchestratorvalid_image is FalseEND
orchestratorotherwisevision
visionspeech
speechEND

Step-by-step execution

1

Graph receives initial state

workflow.invoke(state) is called from main.py with a populated AgentState dictionary containing image_path, session_id, detailed, and default values for all optional fields. LangGraph passes this state to the first node.
2

Orchestrator node runs

orchestrator_agent validates the image, computes its hash, and queries the session cache. It returns an updated state with valid_image set to True or False. On a cache hit it also sets description and audio_path.
3

Route function decides the next node

LangGraph calls route(state) with the state returned by the orchestrator. If valid_image is False, execution jumps directly to END and the final state is returned to the caller with the error field populated. Otherwise, the graph continues to the vision node.
4

Vision node generates a description

vision_node calls describe_image_tool(visual_analysis_agent, state). The agent runs BLIP to caption the image in English, then calls the Groq LLM to produce a Croatian description according to the detailed flag. The result is merged into state as description.
5

Speech node produces audio

speech_agent reads state["description"], synthesises speech with Edge TTS, saves the MP3 to disk, and returns the updated state with audio_path pointing to the file. LangGraph then routes to END and returns the completed state to the caller.
When the image fails validation, valid_image is set to False and error is set to "Invalid image". The route function immediately returns END, so neither the vision nor the speech node runs. The returned state will have empty description and audio_path fields. Always check state["valid_image"] before attempting to read these fields in calling code.

Build docs developers (and LLMs) love