Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/primeintellect-ai/verifiers/llms.txt

Use this file to discover all available pages before exploring further.

The MCPEnv integration allows you to connect to MCP (Model Context Protocol) servers and expose their tools to language models in Verifiers environments. MCP provides a standardized way to connect AI models to external data sources and tools via a simple protocol.

Features

  • Multiple MCP servers - Connect to multiple servers simultaneously
  • Automatic tool discovery - Tools from servers are automatically exposed to models
  • stdio transport - Communicates via standard input/output
  • Type-safe - Preserves tool schemas and parameter types
  • Built on ToolEnv - Inherits all ToolEnv features

Installation

MCP support is included in core Verifiers:
uv add verifiers
The MCP SDK is automatically installed as a dependency.

Quick Start

1

Create an environment

Create a basic MCP environment:
import os
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPEnv
from datasets import Dataset

def load_environment():
    # Configure MCP servers
    mcp_servers = [
        {
            "name": "fetch",
            "command": "uvx",
            "args": ["mcp-server-fetch"],
            "description": "Fetch web content"
        },
    ]

    # Create dataset
    dataset = Dataset.from_dict({
        "question": [
            "What is the latest news on OpenAI's website?",
        ],
        "answer": ["Recent updates about GPT models"]
    })

    # Create rubric
    rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
    
    async def judge_reward(judge, prompt, completion, answer):
        response = await judge(prompt, completion, answer)
        return 1.0 if "yes" in response.lower() else 0.0
    
    rubric.add_reward_func(judge_reward)

    # Create environment
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
        max_turns=10,
    )
2

Evaluate

Run an evaluation:
prime eval run my-mcp-env -m openai/gpt-4.1-mini -n 5

MCP Server Configuration

Configure MCP servers using the MCPServerConfig format:
mcp_servers = [
    {
        "name": "fetch",
        "command": "uvx",
        "args": ["mcp-server-fetch"],
        "description": "Fetch web content",
    },
    {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
        "description": "File system access",
    },
]
Configuration fields:
  • name - Identifier for the server
  • command - Command to launch the server
  • args - List of command arguments
  • env - Environment variables (optional)
  • description - Human-readable description (optional)

With Environment Variables

For servers requiring API keys:
import os
import verifiers as vf

def load_environment():
    vf.ensure_keys(["EXA_API_KEY"])  # Validate key exists
    
    mcp_servers = [
        {
            "name": "exa",
            "command": "npx",
            "args": ["-y", "exa-mcp-server"],
            "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
            "description": "Exa search",
        },
    ]
    
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
    )

Available MCP Servers

Common MCP servers you can use: Fetch - Retrieve web content
{
    "name": "fetch",
    "command": "uvx",
    "args": ["mcp-server-fetch"],
}
Exa - AI-powered search
{
    "name": "exa",
    "command": "npx",
    "args": ["-y", "exa-mcp-server"],
    "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
}
Brave Search - Web search
{
    "name": "brave",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-brave-search"],
    "env": {"BRAVE_API_KEY": os.environ["BRAVE_API_KEY"]},
}

File System

Filesystem - Read/write files
{
    "name": "filesystem",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"],
}

Databases

PostgreSQL - Query databases
{
    "name": "postgres",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-postgres"],
    "env": {"POSTGRES_URL": os.environ["POSTGRES_URL"]},
}
SQLite - Local database access
{
    "name": "sqlite",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-sqlite", "database.db"],
}

Development Tools

Git - Repository operations
{
    "name": "git",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-git"],
}
GitHub - GitHub API access
{
    "name": "github",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-github"],
    "env": {"GITHUB_TOKEN": os.environ["GITHUB_TOKEN"]},
}
See MCP servers directory for more servers.

Full Example

Here’s a complete example using multiple MCP servers:
import os
from datasets import Dataset
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPEnv

def load_environment(
    mcp_servers: list | None = None,
    dataset=None,
    **kwargs
) -> vf.Environment:
    # Validate API keys
    vf.ensure_keys(["EXA_API_KEY"])
    
    # Configure MCP servers
    if mcp_servers is None:
        mcp_servers = [
            {
                "name": "exa",
                "command": "npx",
                "args": ["-y", "exa-mcp-server"],
                "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
                "description": "Exa AI search",
            },
            {
                "name": "fetch",
                "command": "uvx",
                "args": ["mcp-server-fetch"],
                "description": "Fetch web content",
            },
        ]
    
    # Create dataset
    if dataset is None:
        dataset = Dataset.from_dict({
            "question": [
                "Find the latest Prime Intellect announcement",
                "What is the current weather in San Francisco?",
            ],
            "answer": [
                "Information about recent announcements",
                "Current weather conditions",
            ]
        })
    
    # Create rubric with judge
    rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
    
    async def judge_reward(judge, prompt, completion, answer, state):
        verdict = await judge(prompt, completion, answer, state)
        return 1.0 if "yes" in verdict.lower() else 0.0
    
    rubric.add_reward_func(judge_reward, weight=1.0)
    
    # Create MCP environment
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
        max_turns=10,
        **kwargs,
    )

Error Handling

Configure error handling behavior:
def custom_error_formatter(error: Exception) -> str:
    """Format errors for the model."""
    return f"Tool error: {str(error)[:100]}"

env = MCPEnv(
    mcp_servers=mcp_servers,
    dataset=dataset,
    rubric=rubric,
    error_formatter=custom_error_formatter,
)

Architecture Notes

MCPEnv is designed for globally available, read-only MCP servers where the same toolset can be shared across all rollouts. For servers requiring per-rollout state or mutable task-specific data, consider implementing a custom StatefulToolEnv subclass.

Connection Management

MCP servers are connected once during environment initialization and shared across all rollouts:
  1. Environment starts background event loop
  2. Connects to all configured MCP servers
  3. Discovers available tools via tools/list
  4. Exposes tools to rollouts
  5. Cleanup on environment shutdown

Tool Execution

When a model calls an MCP tool:
  1. Tool call is intercepted by MCPEnv
  2. Request is sent to appropriate MCP server
  3. Response is returned as tool message
  4. Errors are formatted via error_formatter

Best Practices

  • Validate API keys - Use vf.ensure_keys() to fail fast if keys are missing
  • Document requirements - List required environment variables in README
  • Test servers locally - Verify MCP servers work before using in environments
  • Handle errors gracefully - Provide clear error messages via error_formatter
  • Limit tool calls - Set reasonable max_turns to prevent infinite loops

Limitations

  • MCP servers must support stdio transport
  • Servers are started once per environment, not per rollout
  • No support for resources or prompts (tools only)
  • Limited to read-only operations (no per-rollout state)

Examples

See the mcp-search-env example in the Verifiers repository for a complete implementation.

Further Reading

Build docs developers (and LLMs) love