A production agent is more than a prompt and a model call. It is a system of interconnected components that must work reliably under real load, with real users, over time. This page maps the full stack and explains what each layer does, why it matters, and which tutorials cover it.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/NirDiamant/agents-towards-production/llms.txt
Use this file to discover all available pages before exploring further.
The architecture diagram in the repository (
assets/repos_images/ai_architecture_diagram.svg) shows how these layers connect into a single production workflow. Each tutorial in the repository targets one or more of these layers.The production agent stack
A complete production agent stack has six layers. Each layer is independent enough to be learned separately, but all six must be present before you can call a system production-ready.Orchestration
Controls how the agent reasons, routes between steps, and manages multi-turn state. This is the core logic layer.
Memory
Gives the agent access to past context — short-term session state, long-term user preferences, and semantic knowledge retrieval.
Tools
Connects the agent to external services, APIs, and data sources it can act on — web search, databases, productivity apps, and more.
Security
Protects the agent and its users from prompt injection, data leakage, unauthorized tool access, and adversarial inputs.
Observability
Makes agent behavior visible and debuggable — traces, evaluation metrics, and automated quality checks.
Deployment
Packages and scales the agent as a service — containers, cloud infrastructure, GPU environments, and API endpoints.
Layer 1: Orchestration
Orchestration is how you define what the agent does, in what order, and under what conditions. Without explicit orchestration, agents become unpredictable as complexity grows. Production orchestration means:- Stateful graphs — the agent carries context across steps rather than starting fresh on each turn
- Conditional routing — different inputs trigger different processing paths
- Composable nodes — each capability is an isolated, testable unit
LangGraph workflows
Build a stateful, multi-step text analysis pipeline using directed graph architecture.
FastAPI agent endpoints
Expose your orchestration layer as a REST API with both synchronous and streaming responses.
Model Context Protocol (MCP)
Standardize how your agent connects to and calls external tools and APIs.
Kotlin agents with Koog
Build and orchestrate agents on the JVM using JetBrains’ Koog framework.
Layer 2: Memory
An agent without memory repeats itself, forgets context, and cannot learn from interactions. Memory in production takes three forms:- Short-term memory
- Long-term memory
- Semantic retrieval (RAG)
Within-session state — what the agent has said and done in the current conversation. In LangGraph, this is the typed
State object that flows through each node in the graph.Dual-memory with Redis
Implement short-term and long-term memory with semantic search and persistent vector storage.
Self-improving memory with Mem0
Build hybrid vector-and-graph memory that automatically resolves conflicts and evolves over time.
Knowledge graphs with Cognee
Transform unstructured data into structured knowledge graphs the agent can reason over.
Enterprise RAG with Contextual AI
Deploy a managed RAG pipeline with intelligent indexing, agent integration, and LMUnit evaluation.
Layer 3: Tools
Tools are how agents take action. A tool-equipped agent can search the web, read a database, send a message, call an API, or scrape a website — not just generate text. Production tool use requires more than a function call. You need:- Authentication — tools often act on behalf of specific users with scoped permissions
- Human-in-the-loop controls — some tool calls should require approval before execution
- Data quality — external data must be clean, structured, and up to date
Secure tool calling with Arcade
Integrate Gmail, Slack, and Notion with OAuth2 authentication and user-level approval workflows.
Real-time web search with Tavily
Give the agent live web access for research, monitoring, and current-events queries.
Large-scale web data with Bright Data
Collect structured data from complex websites at scale using enterprise proxy infrastructure.
Layer 4: Security
Security is not an afterthought in production agents. Agents that can call tools, read databases, and generate content on behalf of users are high-value attack targets. The primary threats in production agent systems are:Prompt injection
Prompt injection
Malicious content in user input or retrieved documents that hijacks the agent’s instructions. Defense requires input sanitization, output filtering, and structural separation between data and instructions.
Tool misuse and privilege escalation
Tool misuse and privilege escalation
An agent that can call external APIs may be manipulated into making unauthorized calls. Scoped OAuth2 permissions and human-in-the-loop approval gates limit blast radius.
Data leakage through outputs
Data leakage through outputs
Sensitive information retrieved from memory or tool calls can leak into responses. Output guardrails scan for PII, credentials, and other sensitive content before returning responses to users.
Behavior misalignment
Behavior misalignment
Agents can generate responses that are unsafe, off-topic, or violate policy even without explicit attacks. Behavior alignment layers monitor outputs against defined policies.
LlamaFirewall guardrails
Apply comprehensive input, output, and tool-access security guardrails to a production agent.
Security testing with Apex
Run prompt injection attacks and automated security testing to find and fix vulnerabilities.
Layer 5: Observability
You cannot improve what you cannot measure. Observability in agent systems means capturing enough structured data to understand why the agent made each decision — and to catch regressions before users do. Observability covers three distinct concerns:Tracing
Capturing the full execution path of each agent run — which nodes executed, what inputs they received, what outputs they produced, and how long each step took.
Evaluation
Automatically scoring agent outputs against expected behavior, using behavioral analysis and performance metrics to track quality over time.
Fine-tuning
Adapting the underlying model to your specific domain using evaluation data, improving accuracy and reducing inference cost for specialized tasks.
Tracing with LangSmith
Add comprehensive observability to capture traces, decision points, and timing across every agent run.
Evaluation with IntellAgent
Automate behavioral analysis and generate actionable insights to improve agent quality continuously.
Fine-tuning for domain expertise
Fine-tune a language model for specialized agent behavior with data preparation, training, and evaluation.
Multi-agent coordination (A2A)
Simulate collaborative multi-agent workflows using open communication protocols for interoperability.
Layer 6: Deployment
An agent that only runs on your laptop is not in production. Deployment turns your agent into a service — packaged, scalable, and accessible to users.Docker containerization
Package your agent as a container for consistent, reproducible deployments across any environment.
AWS Bedrock AgentCore
Deploy agents as managed services on AWS with automatic infrastructure and request tracking.
On-premises LLMs with Ollama
Replace cloud API calls with local models for data privacy, cost control, and lower latency.
GPU cloud with RunPod
Scale compute-intensive agent workloads on cost-effective GPU infrastructure.
Putting the stack together
No single tutorial covers the entire stack, but by working through tutorials across all six layers you will have hands-on experience with each component. A typical production agent combines:- LangGraph for stateful orchestration
- Redis or Mem0 for persistent memory
- Arcade or MCP for secure tool access
- LlamaFirewall for input/output guardrails
- LangSmith for tracing and debugging
- Docker or AWS Bedrock for deployment
Quickstart
Run your first tutorial in under ten minutes.
All tutorials
Browse all 22 tutorials and choose where to start.