Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/microsoft/mcp-for-beginners/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Routing in MCP involves directing requests to the most suitable models or services based on content type, user context, and system load — ensuring efficient processing and optimal resource utilization.

Content-based routing

Route to specialized models based on request type (code, creative, scientific)

Load balancing

Distribute requests across nodes using round-robin, response time, or content-aware strategies

Dynamic tool routing

Route tool calls to regional endpoints, specific API versions, or latency-optimized backends

Content-based routing

Content-based routing directs requests to specialized services based on what the request contains. Code generation goes to a code model; creative writing goes to a creative model.
// .NET Example: Content-based routing
public class ContentBasedRouter
{
    private readonly Dictionary<string, McpClient> _specializedClients;
    private readonly RoutingClassifier _classifier;

    public ContentBasedRouter()
    {
        _specializedClients = new Dictionary<string, McpClient>
        {
            ["code"]       = new McpClient("https://code-specialized-mcp.com"),
            ["creative"]   = new McpClient("https://creative-specialized-mcp.com"),
            ["scientific"] = new McpClient("https://scientific-specialized-mcp.com"),
            ["general"]    = new McpClient("https://general-mcp.com")
        };
        _classifier = new RoutingClassifier();
    }

    public async Task<McpResponse> RouteAndProcessAsync(
        string prompt,
        IDictionary<string, object> parameters = null)
    {
        string category = await _classifier.ClassifyPromptAsync(prompt);

        var client = _specializedClients.ContainsKey(category)
            ? _specializedClients[category]
            : _specializedClients["general"];

        Console.WriteLine($"Routing to {category} specialized service");
        return await client.SendPromptAsync(prompt, parameters);
    }

    private class RoutingClassifier
    {
        public Task<string> ClassifyPromptAsync(string prompt)
        {
            prompt = prompt.ToLowerInvariant();

            if (prompt.Contains("code") || prompt.Contains("function") ||
                prompt.Contains("program") || prompt.Contains("algorithm"))
                return Task.FromResult("code");

            if (prompt.Contains("story") || prompt.Contains("creative") ||
                prompt.Contains("imagine") || prompt.Contains("design"))
                return Task.FromResult("creative");

            if (prompt.Contains("science") || prompt.Contains("research") ||
                prompt.Contains("analyze") || prompt.Contains("study"))
                return Task.FromResult("scientific");

            return Task.FromResult("general");
        }
    }
}

Intelligent load balancing

Load balancing optimizes resource utilization and ensures high availability. Three strategies are shown below:
StrategyBest for
Round-robinEven distribution, equal-capacity nodes
Response timeHeterogeneous nodes, latency-sensitive workloads
Content-awareSpecialized nodes optimized for specific request types
// Java: Intelligent load balancing with multiple strategies
public class McpLoadBalancer {
    private final List<McpServerNode> serverNodes;
    private final LoadBalancingStrategy strategy;

    public McpLoadBalancer(List<McpServerNode> nodes, LoadBalancingStrategy strategy) {
        this.serverNodes = new ArrayList<>(nodes);
        this.strategy = strategy;
    }

    public McpResponse processRequest(McpRequest request) {
        McpServerNode selectedNode = strategy.selectNode(serverNodes, request);
        try {
            return selectedNode.processRequest(request);
        } catch (Exception e) {
            selectedNode.recordFailure();

            List<McpServerNode> remainingNodes = new ArrayList<>(serverNodes);
            remainingNodes.remove(selectedNode);

            if (!remainingNodes.isEmpty()) {
                McpServerNode fallbackNode = strategy.selectNode(remainingNodes, request);
                return fallbackNode.processRequest(request);
            }
            throw new RuntimeException("All MCP server nodes failed");
        }
    }

    // Round-robin strategy
    public static class RoundRobinStrategy implements LoadBalancingStrategy {
        private AtomicInteger counter = new AtomicInteger(0);

        @Override
        public McpServerNode selectNode(List<McpServerNode> nodes, McpRequest request) {
            List<McpServerNode> healthyNodes = nodes.stream()
                .filter(McpServerNode::isHealthy)
                .collect(Collectors.toList());

            if (healthyNodes.isEmpty())
                throw new RuntimeException("No healthy nodes available");

            int index = counter.getAndIncrement() % healthyNodes.size();
            return healthyNodes.get(index);
        }
    }

    // Weighted response time strategy
    public static class ResponseTimeStrategy implements LoadBalancingStrategy {
        @Override
        public McpServerNode selectNode(List<McpServerNode> nodes, McpRequest request) {
            return nodes.stream()
                .filter(McpServerNode::isHealthy)
                .min(Comparator.comparing(McpServerNode::getAverageResponseTime))
                .orElseThrow(() -> new RuntimeException("No healthy nodes"));
        }
    }

    // Content-aware strategy
    public static class ContentAwareStrategy implements LoadBalancingStrategy {
        @Override
        public McpServerNode selectNode(List<McpServerNode> nodes, McpRequest request) {
            boolean isCodeRequest = request.getPrompt().contains("code") ||
                                    request.getAllowedTools().contains("codeInterpreter");

            Optional<McpServerNode> specializedNode = nodes.stream()
                .filter(McpServerNode::isHealthy)
                .filter(node -> isCodeRequest &&
                                node.getSpecialization().equals("code"))
                .findFirst();

            return specializedNode.orElse(
                nodes.stream()
                    .filter(McpServerNode::isHealthy)
                    .min(Comparator.comparing(McpServerNode::getCurrentLoad))
                    .orElseThrow(() -> new RuntimeException("No healthy nodes"))
            );
        }
    }
}

Dynamic tool routing

Tool routing directs tool calls to the most appropriate endpoint based on user context — such as regional endpoints for data residency or versioned endpoints for API compatibility.
# Python: Dynamic tool routing
class McpToolRouter:
    def __init__(self):
        self.tool_endpoints = {
            "weatherTool":    "https://weather-service.example.com/api",
            "calculatorTool": "https://calculator-service.example.com/compute",
            "databaseTool":   "https://database-service.example.com/query",
            "searchTool":     "https://search-service.example.com/search"
        }

        self.regional_endpoints = {
            "us": {
                "weatherTool": "https://us-west.weather-service.example.com/api",
                "searchTool":  "https://us.search-service.example.com/search"
            },
            "europe": {
                "weatherTool": "https://eu.weather-service.example.com/api",
                "searchTool":  "https://eu.search-service.example.com/search"
            }
        }

        self.tool_versions = {
            "weatherTool": {
                "default": "v2",
                "v1":      "https://weather-service.example.com/api/v1",
                "v2":      "https://weather-service.example.com/api/v2",
                "beta":    "https://weather-service.example.com/api/beta"
            }
        }

    async def route_tool_request(self, tool_name, parameters, user_context=None):
        endpoint = self._select_endpoint(tool_name, parameters, user_context)
        if not endpoint:
            raise ValueError(f"No endpoint available for tool: {tool_name}")
        return await self._execute_tool_request(endpoint, tool_name, parameters)

    def _select_endpoint(self, tool_name, parameters, user_context=None):
        if tool_name not in self.tool_endpoints:
            return None

        base_endpoint = self.tool_endpoints[tool_name]

        # Version routing
        if tool_name in self.tool_versions:
            version_info = self.tool_versions[tool_name]
            requested_version = parameters.get("_version", version_info["default"])
            if requested_version in version_info:
                base_endpoint = version_info[requested_version]

        # Regional routing
        if user_context and "region" in user_context:
            user_region = user_context["region"]
            if user_region in self.regional_endpoints:
                regional_tools = self.regional_endpoints[user_region]
                if tool_name in regional_tools:
                    return regional_tools[tool_name]

        return base_endpoint

    async def _execute_tool_request(self, endpoint, tool_name, parameters):
        async with aiohttp.ClientSession() as session:
            async with session.post(
                endpoint,
                json={"toolName": tool_name, "parameters": parameters},
                headers={"Content-Type": "application/json"}
            ) as response:
                if response.status == 200:
                    return await response.json()
                error_text = await response.text()
                raise Exception(f"Tool execution failed: {error_text}")

Routing and sampling architecture

The diagram below shows how routing and sampling work together in a comprehensive MCP architecture:
MCP Client


Request Router ──► Content Analyzer ──► Sampling Configurator


Load Balancer ──► Server Pool ──► Model Selector

                            ┌───────────┼───────────┐
                            ▼           ▼           ▼
                      Model A      Model B      Model C
                            │           │           │
                            └───────────┴───────────┘

                                   Tool Router
                                   │         │
                             Primary Tools  Regional Tools
Combine content-aware routing with regional tool routing so that a French user’s weather query is routed to both the creative-writing model (if asking in a conversational tone) and the EU weather endpoint (for data residency compliance).

Build docs developers (and LLMs) love