Use this file to discover all available pages before exploring further.
Testing an agent manually against a handful of hand-crafted examples misses the edge cases that matter most. IntellAgent solves this by automatically generating diverse scenarios from your agent’s own policy document, simulating realistic multi-turn conversations, and producing a detailed behavioral report — all without writing a single test case by hand.
IntellAgent runs a three-stage pipeline against your agent:
Stage 1: Scenario generation
Reads your agent’s system prompt and automatically creates realistic, policy-challenging test scenarios — including edge cases you might not anticipate.
Stage 2: Dynamic simulation
Simulates multi-turn conversations between a virtual user and your agent, adapting interaction patterns based on how the agent responds.
Stage 3: Fine-grained analysis
Identifies policy violations, performance gaps, and provides actionable recommendations — all accessible through an interactive Streamlit dashboard.
IntellAgent supports all major LLM providers through LangChain. Create a YAML config file with your credentials:
import osimport yamlOPENAI_API_KEY = "your-api-key-here" # Replace with your actual API keyllm_config = { "openai": { "OPENAI_API_KEY": OPENAI_API_KEY }}os.makedirs("config", exist_ok=True)with open("config/llm_env.yml", "w") as f: yaml.dump(llm_config, f)print("LLM API credentials configured successfully.")
IntellAgent supports OpenAI, Anthropic, Azure, Google, Bedrock, and NVIDIA providers. Add multiple providers to the same llm_env.yml file.
IntellAgent evaluates your agent against the policies you write in its system prompt. The more explicit your policies, the more targeted the generated test scenarios will be.
os.makedirs("examples/my_education_agent/input", exist_ok=True)education_prompt = """# Educational Assistant GuidelinesYou are an educational assistant designed to help students with their learning needs.## Core Responsibilities:- Provide clear, accurate information on educational topics- Explain complex concepts in simple terms- Help with homework questions by guiding the student through the solution process- Recommend learning resources when appropriate## Policies:1. **Do not solve problems directly** - Instead, provide guidance and hints2. **Use age-appropriate language** - Adjust explanations based on the student's level3. **Encourage critical thinking** - Ask follow-up questions that promote deeper understanding4. **Be patient and supportive** - Create a positive learning environment5. **Verify understanding** - Check if the student has understood the explanation## Subject Areas:- Mathematics (Basic arithmetic to advanced calculus)- Science (Physics, Chemistry, Biology)- Language Arts (Grammar, Writing, Literature)- Social Studies (History, Geography, Civics)"""with open("examples/my_education_agent/input/wiki.md", "w") as f: f.write(education_prompt)
Name your policies explicitly (for example, “Policy 1: Do not solve problems directly”). IntellAgent uses these labels in its violation reports so you can trace failures back to specific policy statements.
The configuration file controls which models run the evaluation framework versus which model plays the role of your agent. You can mix and match providers.
config = { "environment": { "prompt_path": "examples/my_education_agent/input/wiki.md", }, "llm_intellagent": { "type": "openai", # Model driving IntellAgent's evaluation logic "name": "gpt-4o" }, "llm_chat": { "type": "openai", # Model acting as your agent under test "name": "gpt-4o-mini" }, "dataset": { "num_samples": 10 # Number of test scenarios to generate }}with open("config/my_education_config.yml", "w") as f: yaml.dump(config, f, default_flow_style=False)print(f"Will generate {config['dataset']['num_samples']} test scenarios")
import nest_asyncioimport warningswarnings.filterwarnings( "ignore", message="API key must be provided when using hosted LangSmith API")nest_asyncio.apply()from simulator.utils.file_reading import override_configfrom simulator.simulator_executor import SimulatorExecutorbase_output_path = './results/education'config = override_config('config/my_education_config.yml')executor = SimulatorExecutor(config, base_output_path)print(f"Results will be saved to: {base_output_path}")
Try IntellAgent against agents that use databases and tools — see the airline example in the IntellAgent docs.
Customize evaluation criteria
Define domain-specific success metrics and real-world data integration. See the customization guide.
Scale up scenarios
Increase num_samples to 50–100 for production readiness validation. More scenarios surface rarer edge cases.
Iterate on the prompt
Use violation reports to refine your agent’s system prompt, then re-run evaluation to measure the improvement.
IntellAgent uses LLM calls to generate scenarios and simulate users. Running 10 scenarios against GPT-4o typically costs under $0.50, but costs scale with num_samples and model choice.