Production agent architecture: the full stack

A production agent is more than a prompt and a model call. It is a system of interconnected components that must work reliably under real load, with real users, over time. This page maps the full stack and explains what each layer does, why it matters, and which tutorials cover it.

The architecture diagram in the repository (assets/repos_images/ai_architecture_diagram.svg) shows how these layers connect into a single production workflow. Each tutorial in the repository targets one or more of these layers.

The production agent stack

A complete production agent stack has six layers. Each layer is independent enough to be learned separately, but all six must be present before you can call a system production-ready.

Orchestration

Controls how the agent reasons, routes between steps, and manages multi-turn state. This is the core logic layer.

Memory

Gives the agent access to past context — short-term session state, long-term user preferences, and semantic knowledge retrieval.

Tools

Connects the agent to external services, APIs, and data sources it can act on — web search, databases, productivity apps, and more.

Security

Protects the agent and its users from prompt injection, data leakage, unauthorized tool access, and adversarial inputs.

Observability

Makes agent behavior visible and debuggable — traces, evaluation metrics, and automated quality checks.

Deployment

Packages and scales the agent as a service — containers, cloud infrastructure, GPU environments, and API endpoints.

Layer 1: Orchestration

Orchestration is how you define what the agent does, in what order, and under what conditions. Without explicit orchestration, agents become unpredictable as complexity grows. Production orchestration means:

Stateful graphs — the agent carries context across steps rather than starting fresh on each turn
Conditional routing — different inputs trigger different processing paths
Composable nodes — each capability is an isolated, testable unit

LangGraph workflows

Build a stateful, multi-step text analysis pipeline using directed graph architecture.

FastAPI agent endpoints

Expose your orchestration layer as a REST API with both synchronous and streaming responses.

Model Context Protocol (MCP)

Standardize how your agent connects to and calls external tools and APIs.

Kotlin agents with Koog

Build and orchestrate agents on the JVM using JetBrains’ Koog framework.

Layer 2: Memory

An agent without memory repeats itself, forgets context, and cannot learn from interactions. Memory in production takes three forms:

Short-term memory
Long-term memory
Semantic retrieval (RAG)

Within-session state — what the agent has said and done in the current conversation. In LangGraph, this is the typed State object that flows through each node in the graph.

Dual-memory with Redis

Implement short-term and long-term memory with semantic search and persistent vector storage.

Self-improving memory with Mem0

Build hybrid vector-and-graph memory that automatically resolves conflicts and evolves over time.

Knowledge graphs with Cognee

Transform unstructured data into structured knowledge graphs the agent can reason over.

Enterprise RAG with Contextual AI

Deploy a managed RAG pipeline with intelligent indexing, agent integration, and LMUnit evaluation.

Layer 3: Tools

Tools are how agents take action. A tool-equipped agent can search the web, read a database, send a message, call an API, or scrape a website — not just generate text. Production tool use requires more than a function call. You need:

Authentication — tools often act on behalf of specific users with scoped permissions
Human-in-the-loop controls — some tool calls should require approval before execution
Data quality — external data must be clean, structured, and up to date

Secure tool calling with Arcade

Integrate Gmail, Slack, and Notion with OAuth2 authentication and user-level approval workflows.

Real-time web search with Tavily

Give the agent live web access for research, monitoring, and current-events queries.

Large-scale web data with Bright Data

Collect structured data from complex websites at scale using enterprise proxy infrastructure.

Layer 4: Security

Security is not an afterthought in production agents. Agents that can call tools, read databases, and generate content on behalf of users are high-value attack targets.

Security tools require ethical use with proper authorization. Always test security capabilities in isolated environments and never run attack simulations against systems you do not own.

The primary threats in production agent systems are:

Prompt injection

Malicious content in user input or retrieved documents that hijacks the agent’s instructions. Defense requires input sanitization, output filtering, and structural separation between data and instructions.

Tool misuse and privilege escalation

An agent that can call external APIs may be manipulated into making unauthorized calls. Scoped OAuth2 permissions and human-in-the-loop approval gates limit blast radius.

Data leakage through outputs

Sensitive information retrieved from memory or tool calls can leak into responses. Output guardrails scan for PII, credentials, and other sensitive content before returning responses to users.

Behavior misalignment

Agents can generate responses that are unsafe, off-topic, or violate policy even without explicit attacks. Behavior alignment layers monitor outputs against defined policies.

LlamaFirewall guardrails

Apply comprehensive input, output, and tool-access security guardrails to a production agent.

Security testing with Apex

Run prompt injection attacks and automated security testing to find and fix vulnerabilities.

Layer 5: Observability

You cannot improve what you cannot measure. Observability in agent systems means capturing enough structured data to understand why the agent made each decision — and to catch regressions before users do. Observability covers three distinct concerns:

Tracing

Capturing the full execution path of each agent run — which nodes executed, what inputs they received, what outputs they produced, and how long each step took.

Evaluation

Automatically scoring agent outputs against expected behavior, using behavioral analysis and performance metrics to track quality over time.

Fine-tuning

Adapting the underlying model to your specific domain using evaluation data, improving accuracy and reducing inference cost for specialized tasks.

Tracing with LangSmith

Add comprehensive observability to capture traces, decision points, and timing across every agent run.

Evaluation with IntellAgent

Automate behavioral analysis and generate actionable insights to improve agent quality continuously.

Fine-tuning for domain expertise

Fine-tune a language model for specialized agent behavior with data preparation, training, and evaluation.

Multi-agent coordination (A2A)

Simulate collaborative multi-agent workflows using open communication protocols for interoperability.

Layer 6: Deployment

An agent that only runs on your laptop is not in production. Deployment turns your agent into a service — packaged, scalable, and accessible to users.

Docker containerization

Package your agent as a container for consistent, reproducible deployments across any environment.

AWS Bedrock AgentCore

Deploy agents as managed services on AWS with automatic infrastructure and request tracking.

On-premises LLMs with Ollama

Replace cloud API calls with local models for data privacy, cost control, and lower latency.

GPU cloud with RunPod

Scale compute-intensive agent workloads on cost-effective GPU infrastructure.

Putting the stack together

No single tutorial covers the entire stack, but by working through tutorials across all six layers you will have hands-on experience with each component. A typical production agent combines:

LangGraph for stateful orchestration
Redis or Mem0 for persistent memory
Arcade or MCP for secure tool access
LlamaFirewall for input/output guardrails
LangSmith for tracing and debugging
Docker or AWS Bedrock for deployment

Start with the Quickstart to run the LangGraph tutorial, then follow the layer that is most relevant to your current project.

Quickstart

Run your first tutorial in under ten minutes.

All tutorials

Browse all 22 tutorials and choose where to start.

Get Started

Agent Frameworks

Memory & Knowledge

Tool Integration & Data

Deployment

Observability & Quality

Documentation Index

​The production agent stack

Orchestration

Memory

Tools

Security

Observability

Deployment

​Layer 1: Orchestration

LangGraph workflows

FastAPI agent endpoints

Model Context Protocol (MCP)

Kotlin agents with Koog

​Layer 2: Memory

Dual-memory with Redis

Self-improving memory with Mem0

Knowledge graphs with Cognee

Enterprise RAG with Contextual AI

​Layer 3: Tools

Secure tool calling with Arcade

Real-time web search with Tavily

Large-scale web data with Bright Data

​Layer 4: Security

LlamaFirewall guardrails

Security testing with Apex

​Layer 5: Observability

Tracing

Evaluation

Fine-tuning

Tracing with LangSmith

Evaluation with IntellAgent

Fine-tuning for domain expertise

Multi-agent coordination (A2A)

​Layer 6: Deployment

Docker containerization

AWS Bedrock AgentCore

On-premises LLMs with Ollama

GPU cloud with RunPod

​Putting the stack together

Quickstart

All tutorials

Build docs developers (and LLMs) love

The production agent stack

Layer 1: Orchestration

Layer 2: Memory

Layer 3: Tools

Layer 4: Security

Layer 5: Observability

Layer 6: Deployment

Putting the stack together