Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/NirDiamant/agents-towards-production/llms.txt

Use this file to discover all available pages before exploring further.

A production agent is more than a prompt and a model call. It is a system of interconnected components that must work reliably under real load, with real users, over time. This page maps the full stack and explains what each layer does, why it matters, and which tutorials cover it.
The architecture diagram in the repository (assets/repos_images/ai_architecture_diagram.svg) shows how these layers connect into a single production workflow. Each tutorial in the repository targets one or more of these layers.

The production agent stack

A complete production agent stack has six layers. Each layer is independent enough to be learned separately, but all six must be present before you can call a system production-ready.

Orchestration

Controls how the agent reasons, routes between steps, and manages multi-turn state. This is the core logic layer.

Memory

Gives the agent access to past context — short-term session state, long-term user preferences, and semantic knowledge retrieval.

Tools

Connects the agent to external services, APIs, and data sources it can act on — web search, databases, productivity apps, and more.

Security

Protects the agent and its users from prompt injection, data leakage, unauthorized tool access, and adversarial inputs.

Observability

Makes agent behavior visible and debuggable — traces, evaluation metrics, and automated quality checks.

Deployment

Packages and scales the agent as a service — containers, cloud infrastructure, GPU environments, and API endpoints.

Layer 1: Orchestration

Orchestration is how you define what the agent does, in what order, and under what conditions. Without explicit orchestration, agents become unpredictable as complexity grows. Production orchestration means:
  • Stateful graphs — the agent carries context across steps rather than starting fresh on each turn
  • Conditional routing — different inputs trigger different processing paths
  • Composable nodes — each capability is an isolated, testable unit

LangGraph workflows

Build a stateful, multi-step text analysis pipeline using directed graph architecture.

FastAPI agent endpoints

Expose your orchestration layer as a REST API with both synchronous and streaming responses.

Model Context Protocol (MCP)

Standardize how your agent connects to and calls external tools and APIs.

Kotlin agents with Koog

Build and orchestrate agents on the JVM using JetBrains’ Koog framework.

Layer 2: Memory

An agent without memory repeats itself, forgets context, and cannot learn from interactions. Memory in production takes three forms:
Within-session state — what the agent has said and done in the current conversation. In LangGraph, this is the typed State object that flows through each node in the graph.

Dual-memory with Redis

Implement short-term and long-term memory with semantic search and persistent vector storage.

Self-improving memory with Mem0

Build hybrid vector-and-graph memory that automatically resolves conflicts and evolves over time.

Knowledge graphs with Cognee

Transform unstructured data into structured knowledge graphs the agent can reason over.

Enterprise RAG with Contextual AI

Deploy a managed RAG pipeline with intelligent indexing, agent integration, and LMUnit evaluation.

Layer 3: Tools

Tools are how agents take action. A tool-equipped agent can search the web, read a database, send a message, call an API, or scrape a website — not just generate text. Production tool use requires more than a function call. You need:
  • Authentication — tools often act on behalf of specific users with scoped permissions
  • Human-in-the-loop controls — some tool calls should require approval before execution
  • Data quality — external data must be clean, structured, and up to date

Secure tool calling with Arcade

Integrate Gmail, Slack, and Notion with OAuth2 authentication and user-level approval workflows.

Real-time web search with Tavily

Give the agent live web access for research, monitoring, and current-events queries.

Large-scale web data with Bright Data

Collect structured data from complex websites at scale using enterprise proxy infrastructure.

Layer 4: Security

Security is not an afterthought in production agents. Agents that can call tools, read databases, and generate content on behalf of users are high-value attack targets.
Security tools require ethical use with proper authorization. Always test security capabilities in isolated environments and never run attack simulations against systems you do not own.
The primary threats in production agent systems are:
Malicious content in user input or retrieved documents that hijacks the agent’s instructions. Defense requires input sanitization, output filtering, and structural separation between data and instructions.
An agent that can call external APIs may be manipulated into making unauthorized calls. Scoped OAuth2 permissions and human-in-the-loop approval gates limit blast radius.
Sensitive information retrieved from memory or tool calls can leak into responses. Output guardrails scan for PII, credentials, and other sensitive content before returning responses to users.
Agents can generate responses that are unsafe, off-topic, or violate policy even without explicit attacks. Behavior alignment layers monitor outputs against defined policies.

LlamaFirewall guardrails

Apply comprehensive input, output, and tool-access security guardrails to a production agent.

Security testing with Apex

Run prompt injection attacks and automated security testing to find and fix vulnerabilities.

Layer 5: Observability

You cannot improve what you cannot measure. Observability in agent systems means capturing enough structured data to understand why the agent made each decision — and to catch regressions before users do. Observability covers three distinct concerns:

Tracing

Capturing the full execution path of each agent run — which nodes executed, what inputs they received, what outputs they produced, and how long each step took.

Evaluation

Automatically scoring agent outputs against expected behavior, using behavioral analysis and performance metrics to track quality over time.

Fine-tuning

Adapting the underlying model to your specific domain using evaluation data, improving accuracy and reducing inference cost for specialized tasks.

Tracing with LangSmith

Add comprehensive observability to capture traces, decision points, and timing across every agent run.

Evaluation with IntellAgent

Automate behavioral analysis and generate actionable insights to improve agent quality continuously.

Fine-tuning for domain expertise

Fine-tune a language model for specialized agent behavior with data preparation, training, and evaluation.

Multi-agent coordination (A2A)

Simulate collaborative multi-agent workflows using open communication protocols for interoperability.

Layer 6: Deployment

An agent that only runs on your laptop is not in production. Deployment turns your agent into a service — packaged, scalable, and accessible to users.

Docker containerization

Package your agent as a container for consistent, reproducible deployments across any environment.

AWS Bedrock AgentCore

Deploy agents as managed services on AWS with automatic infrastructure and request tracking.

On-premises LLMs with Ollama

Replace cloud API calls with local models for data privacy, cost control, and lower latency.

GPU cloud with RunPod

Scale compute-intensive agent workloads on cost-effective GPU infrastructure.

Putting the stack together

No single tutorial covers the entire stack, but by working through tutorials across all six layers you will have hands-on experience with each component. A typical production agent combines:
  1. LangGraph for stateful orchestration
  2. Redis or Mem0 for persistent memory
  3. Arcade or MCP for secure tool access
  4. LlamaFirewall for input/output guardrails
  5. LangSmith for tracing and debugging
  6. Docker or AWS Bedrock for deployment
Start with the Quickstart to run the LangGraph tutorial, then follow the layer that is most relevant to your current project.

Quickstart

Run your first tutorial in under ten minutes.

All tutorials

Browse all 22 tutorials and choose where to start.

Build docs developers (and LLMs) love