Sampling lets an MCP Server ask the MCP Client to call an LLM on its behalf. This is useful when your server needs AI-generated content (like a summary or analysis) but shouldn’t — or can’t — call an LLM directly. The client, which already has access to an LLM, handles the request and returns the result.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/microsoft/mcp-for-beginners/llms.txt
Use this file to discover all available pages before exploring further.
When to use sampling
A concrete example: a blog post creation tool that also needs a generated abstract. The server has all the content, but the LLM lives on the client side.The sampling request
The server sends asampling/createMessage JSON-RPC request to the client:
Key fields
| Field | Description |
|---|---|
messages | The conversation messages to send to the LLM |
modelPreferences.hints | Preferred models (the client may use a different one) |
intelligencePriority | 0–1 scale; higher = prefer smarter model |
speedPriority | 0–1 scale; higher = prefer faster model |
systemPrompt | System instruction for the LLM |
maxTokens | Recommended token limit for the response |
Model preferences are recommendations only. The user (via the client) can choose a different model. Your server code must handle responses from any model.
The sampling response
After the client calls the LLM, it sends the result back to the server:model in the response may differ from what you requested — the user chose gpt-5 instead of claude-3-sonnet.
Message content types
Sampling messages support text, images, and audio:Implementing a sampling server (Python)
Here’s a complete blog post tool that uses sampling to generate an abstract:Enabling sampling in the client
If you are also building the client (not just the server), declare sampling support in client capabilities:If you are only building the MCP Server, you don’t need to configure anything on the client side — the host application (Claude Desktop, VS Code, etc.) handles sampling responses automatically.
Key takeaways
- Sampling lets a server delegate LLM calls to the client — the server sends a
sampling/createMessagerequest and the client calls the LLM and returns the result. - Model preferences are recommendations; the client and user choose the actual model used.
- Sampling messages support text, image, and audio content types.
- The server uses
ctx.session.create_message()(Python) to issue sampling requests from within a tool. - This pattern is only available with the low-level server API or via the
Contextobject in FastMCP.