MCP Integration

The MCPEnv integration allows you to connect to MCP (Model Context Protocol) servers and expose their tools to language models in Verifiers environments. MCP provides a standardized way to connect AI models to external data sources and tools via a simple protocol.

Features

Multiple MCP servers - Connect to multiple servers simultaneously
Automatic tool discovery - Tools from servers are automatically exposed to models
stdio transport - Communicates via standard input/output
Type-safe - Preserves tool schemas and parameter types
Built on ToolEnv - Inherits all ToolEnv features

Installation

MCP support is included in core Verifiers:

uv add verifiers

The MCP SDK is automatically installed as a dependency.

Quick Start

Create an environment

Create a basic MCP environment:

import os
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPEnv
from datasets import Dataset

def load_environment():
    # Configure MCP servers
    mcp_servers = [
        {
            "name": "fetch",
            "command": "uvx",
            "args": ["mcp-server-fetch"],
            "description": "Fetch web content"
        },
    ]

    # Create dataset
    dataset = Dataset.from_dict({
        "question": [
            "What is the latest news on OpenAI's website?",
        ],
        "answer": ["Recent updates about GPT models"]
    })

    # Create rubric
    rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
    
    async def judge_reward(judge, prompt, completion, answer):
        response = await judge(prompt, completion, answer)
        return 1.0 if "yes" in response.lower() else 0.0
    
    rubric.add_reward_func(judge_reward)

    # Create environment
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
        max_turns=10,
    )

Evaluate

Run an evaluation:

prime eval run my-mcp-env -m openai/gpt-4.1-mini -n 5

MCP Server Configuration

Configure MCP servers using the MCPServerConfig format:

mcp_servers = [
    {
        "name": "fetch",
        "command": "uvx",
        "args": ["mcp-server-fetch"],
        "description": "Fetch web content",
    },
    {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
        "description": "File system access",
    },
]

Configuration fields:

name - Identifier for the server
command - Command to launch the server
args - List of command arguments
env - Environment variables (optional)
description - Human-readable description (optional)

With Environment Variables

For servers requiring API keys:

import os
import verifiers as vf

def load_environment():
    vf.ensure_keys(["EXA_API_KEY"])  # Validate key exists
    
    mcp_servers = [
        {
            "name": "exa",
            "command": "npx",
            "args": ["-y", "exa-mcp-server"],
            "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
            "description": "Exa search",
        },
    ]
    
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
    )

Available MCP Servers

Common MCP servers you can use:

Web & Search

Fetch - Retrieve web content

{
    "name": "fetch",
    "command": "uvx",
    "args": ["mcp-server-fetch"],
}

Exa - AI-powered search

{
    "name": "exa",
    "command": "npx",
    "args": ["-y", "exa-mcp-server"],
    "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
}

Brave Search - Web search

{
    "name": "brave",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-brave-search"],
    "env": {"BRAVE_API_KEY": os.environ["BRAVE_API_KEY"]},
}

File System

Filesystem - Read/write files

{
    "name": "filesystem",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"],
}

Databases

PostgreSQL - Query databases

{
    "name": "postgres",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-postgres"],
    "env": {"POSTGRES_URL": os.environ["POSTGRES_URL"]},
}

SQLite - Local database access

{
    "name": "sqlite",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-sqlite", "database.db"],
}

Development Tools

Git - Repository operations

{
    "name": "git",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-git"],
}

GitHub - GitHub API access

{
    "name": "github",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-github"],
    "env": {"GITHUB_TOKEN": os.environ["GITHUB_TOKEN"]},
}

See MCP servers directory for more servers.

Full Example

Here’s a complete example using multiple MCP servers:

import os
from datasets import Dataset
import verifiers as vf
from verifiers.envs.experimental.mcp_env import MCPEnv

def load_environment(
    mcp_servers: list | None = None,
    dataset=None,
    **kwargs
) -> vf.Environment:
    # Validate API keys
    vf.ensure_keys(["EXA_API_KEY"])
    
    # Configure MCP servers
    if mcp_servers is None:
        mcp_servers = [
            {
                "name": "exa",
                "command": "npx",
                "args": ["-y", "exa-mcp-server"],
                "env": {"EXA_API_KEY": os.environ["EXA_API_KEY"]},
                "description": "Exa AI search",
            },
            {
                "name": "fetch",
                "command": "uvx",
                "args": ["mcp-server-fetch"],
                "description": "Fetch web content",
            },
        ]
    
    # Create dataset
    if dataset is None:
        dataset = Dataset.from_dict({
            "question": [
                "Find the latest Prime Intellect announcement",
                "What is the current weather in San Francisco?",
            ],
            "answer": [
                "Information about recent announcements",
                "Current weather conditions",
            ]
        })
    
    # Create rubric with judge
    rubric = vf.JudgeRubric(judge_model="gpt-4.1-mini")
    
    async def judge_reward(judge, prompt, completion, answer, state):
        verdict = await judge(prompt, completion, answer, state)
        return 1.0 if "yes" in verdict.lower() else 0.0
    
    rubric.add_reward_func(judge_reward, weight=1.0)
    
    # Create MCP environment
    return MCPEnv(
        mcp_servers=mcp_servers,
        dataset=dataset,
        rubric=rubric,
        max_turns=10,
        **kwargs,
    )

Error Handling

Configure error handling behavior:

def custom_error_formatter(error: Exception) -> str:
    """Format errors for the model."""
    return f"Tool error: {str(error)[:100]}"

env = MCPEnv(
    mcp_servers=mcp_servers,
    dataset=dataset,
    rubric=rubric,
    error_formatter=custom_error_formatter,
)

Architecture Notes

MCPEnv is designed for globally available, read-only MCP servers where the same toolset can be shared across all rollouts. For servers requiring per-rollout state or mutable task-specific data, consider implementing a custom StatefulToolEnv subclass.

Connection Management

MCP servers are connected once during environment initialization and shared across all rollouts:

Environment starts background event loop
Connects to all configured MCP servers
Discovers available tools via tools/list
Exposes tools to rollouts
Cleanup on environment shutdown

Tool Execution

When a model calls an MCP tool:

Tool call is intercepted by MCPEnv
Request is sent to appropriate MCP server
Response is returned as tool message
Errors are formatted via error_formatter

Best Practices

Validate API keys - Use vf.ensure_keys() to fail fast if keys are missing
Document requirements - List required environment variables in README
Test servers locally - Verify MCP servers work before using in environments
Handle errors gracefully - Provide clear error messages via error_formatter
Limit tool calls - Set reasonable max_turns to prevent infinite loops

Limitations

MCP servers must support stdio transport
Servers are started once per environment, not per rollout
No support for resources or prompts (tools only)
Limited to read-only operations (no per-rollout state)

Examples

See the mcp-search-env example in the Verifiers repository for a complete implementation.

Get Started

Core Concepts

Guides

Integrations

Features

Installation

Quick Start

MCP Server Configuration

With Environment Variables

Available MCP Servers

Web & Search

File System

Databases

Development Tools

Full Example

Error Handling

Architecture Notes

Connection Management

Tool Execution

Best Practices

Limitations

Examples

Further Reading

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Integrations

Documentation Index

​Features

​Installation

​Quick Start

​MCP Server Configuration

​With Environment Variables

​Available MCP Servers

​Web & Search

​File System

​Databases

​Development Tools

​Full Example

​Error Handling

​Architecture Notes

​Connection Management

​Tool Execution

​Best Practices

​Limitations

​Examples

​Further Reading

Build docs developers (and LLMs) love

Features

Installation

Quick Start

MCP Server Configuration

With Environment Variables

Available MCP Servers

Web & Search

File System

Databases

Development Tools

Full Example

Error Handling

Architecture Notes

Connection Management

Tool Execution

Best Practices

Limitations

Examples

Further Reading