Screen and Desktop Tools: Screenshots, OCR, Mouse and Keyboard

Screen and desktop tools allow ChatGPT to observe and interact with the Windows desktop — useful for UI automation scenarios and visual debugging when browser or structured tools cannot reach the target application. Screen tools (mcp:screen scope) are read-only observers: they capture what is visible on screen. Desktop tools (mcp:desktop scope) are actuators: they move the mouse, click, and type into whatever the foreground window happens to be. All screen and desktop tools require Windows. They use PowerShell with System.Windows.Forms, System.Drawing, Win32 P/Invoke, and SendKeys internally — no third-party drivers are required.

Desktop tools move the mouse cursor and inject keystrokes into the active application. Do not run the MCP server under an account you are not willing to expose to this level of automation. Desktop actions are journaled but not reversible.

Screen tools (`mcp:screen` scope)

1. `window_list`

Attribute	Value
Required scope	`mcp:screen`
Policy mode	`observe`
Risk tags	`screen-disclosure`, `window-title-disclosure`

Preferred desktop-observation tool before falling back to coordinate-level actions. Lists all visible top-level desktop windows with their process name, PID, window handle, and optionally their bounding rectangle. Window titles are redacted by default — they may contain file paths, URLs, or other sensitive information. Parameters

Parameter	Type	Default	Description
`includeBounds`	`boolean`	`true`	Include the bounding rectangle (`x`, `y`, `width`, `height`) for each window.
`maxWindows`	`number`	`100`	Maximum number of windows to return (max 500). Truncation indicated by `truncated`.
`raw`	`boolean`	`false`	Return unredacted window titles (requires `confirm: true`).
`confirm`	`boolean`	`false`	Required when `raw: true`.

Response fields: windows (array of { pid, processName, title, handle, bounds? }), platform, truncated, raw, redacted.

Use window_list to identify the processName and bounds of the target window before using desktop mouse/keyboard tools. This lets you pass expectedProcessName as a guard to prevent accidental actions on the wrong application.

2. `screen_screenshot`

Attribute	Value
Required scope	`mcp:screen`
Policy mode	`diagnose`
Risk tags	`screen-disclosure`, `secret-read`

Visual fallback for desktop inspection. Captures a PNG screenshot of the primary screen, all screens, or an explicit pixel region. Prefer window_list, browser_snapshot, or structured tools when the information you need is available in a structured form; use this tool when you need to see what is actually rendered on screen. Screenshots are saved to the server’s data directory and automatically pruned when the file count or byte budget is exceeded. Parameters

Parameter	Type	Default	Description
`mode`	`"primary" \| "all_screens" \| "region"`	`"primary"`	Which area to capture.
`x`	`number`	—	Left edge of region in screen coordinates. Required when `mode` is `"region"`.
`y`	`number`	—	Top edge of region. Required when `mode` is `"region"`.
`width`	`number`	—	Width of region in pixels. Required when `mode` is `"region"`.
`height`	`number`	—	Height of region in pixels. Required when `mode` is `"region"`.
`allowFullDesktop`	`boolean`	`false`	Required when `mode` is `"all_screens"`.
`confirm`	`boolean`	`false`	Required when `mode` is `"all_screens"`.
`includeImageBase64`	`boolean`	`false`	Include the PNG as a base64 string in the response (subject to output size limits).

Response fields: screenshotId, path, hash, size, bounds ({ x, y, width, height }), cleanup ({ deleted, deletedBytes }), sourceTrust ("screen_observed_content"), imageBase64?. Screenshot size limits

Limit variable	Default	Description
`GPT_FS_MCP_MAX_SCREENSHOT_AREA_PIXELS`	`33,000,000`	Maximum pixel area of a single screenshot.
`GPT_FS_MCP_MAX_SCREENSHOT_BYTES`	`100,000,000`	Maximum file size in bytes per screenshot.
`GPT_FS_MCP_MAX_SCREENSHOT_DIMENSION`	`8192`	Maximum width or height dimension in pixels.
`GPT_FS_MCP_MAX_SCREENSHOT_FILES`	`100`	Maximum number of screenshot files retained.

3. `screen_ocr`

Attribute	Value
Required scope	`mcp:screen`
Policy mode	`diagnose`
Risk tags	`screen-disclosure`, `secret-read`, `ocr`

Runs Tesseract OCR on a screenshot captured by screen_screenshot. Returns the extracted text with sensitive patterns (email addresses, Bearer tokens, API_KEY=, SECRET=, etc.) redacted by default. OCR is best-effort and requires the tesseract executable to be installed and available on PATH.

screen_ocr and screen_screenshot can capture any information visible on screen, including passwords typed in other applications, authentication tokens displayed in terminals, and confidential documents. Both tools tag their output with sourceTrust: "screen_observed_content" to signal that this content originates from an untrusted visual source.

Parameters

Parameter	Type	Default	Description
`screenshotId`	`string` (UUID)	(required)	The `screenshotId` from a prior `screen_screenshot` call.
`language`	`string`	`"eng"`	Tesseract language code (e.g. `"eng"`, `"deu"`, `"fra"`).
`psm`	`number`	`6`	Tesseract page segmentation mode (0–13). Mode 6 assumes a uniform block of text.
`redact`	`boolean`	`true`	Apply pattern-based redaction to the OCR output.
`raw`	`boolean`	`false`	Return unredacted output. Requires `confirm: true`.
`confirm`	`boolean`	`false`	Required when `raw: true` or `redact: false`.

Response fields: text, available (whether Tesseract was found), language, psm, path, redacted, error?, stderr?, sourceTrust.

Desktop tools (`mcp:desktop` scope)

All desktop tools default to dryRun: true. A dry run returns the current desktop state (mouse position, active window, screen bounds) and what would have been executed, without performing any action. Set dryRun: false and confirm: true to execute. All operate-mode desktop tools include guard parameters that let you assert the expected screen dimensions and active window before the action fires. These guards help prevent misfire when the screen layout or foreground application has changed between planning and execution. Common guard parameters (all desktop operate tools)

Parameter	Type	Description
`expectedProcessName`	`string`	Reject if the active window’s process name does not match (case-insensitive).
`expectedWindowTitle`	`string`	Reject if the active window’s title does not contain this substring (case-insensitive).
`expectedScreenWidth`	`number`	Reject if the virtual screen width does not match exactly.
`expectedScreenHeight`	`number`	Reject if the virtual screen height does not match exactly.

4. `desktop_mouse_position`

Attribute	Value
Required scope	`mcp:desktop`
Policy mode	`observe`

Returns the current mouse cursor position, the primary and virtual screen bounds, and the active foreground window. Use this as the starting point before planning coordinate-based actions. Parameters: none. Response fields: mousePosition ({ x, y }), primaryBounds ({ x, y, width, height }), virtualBounds, activeWindow? ({ handle, pid, processName, title, bounds? }).

5. `desktop_mouse_move`

Attribute	Value
Required scope	`mcp:desktop`
Policy mode	`operate`

Moves the Windows mouse cursor to absolute screen coordinates. The target coordinate is validated against the virtual screen bounds before execution. Defaults to dryRun: true. Parameters

Parameter	Type	Description
`x`	`number`	Target X coordinate in screen pixels (absolute).
`y`	`number`	Target Y coordinate in screen pixels (absolute).
`purpose`	`string`	Short operational purpose for using desktop UI fallback.
`expectedAction`	`string`	Expected UI effect of this mouse move.
`dryRun`	`boolean`	Default `true`. Set `false` to execute.
`confirm`	`boolean`	Required when `dryRun: false`.
(guard params)	—	`expectedProcessName`, `expectedWindowTitle`, etc.

6. `desktop_mouse_click`

Attribute	Value
Required scope	`mcp:desktop`
Policy mode	`operate`

Moves the mouse to absolute screen coordinates and fires one or more mouse button events. Uses Win32 mouse_event P/Invoke. Defaults to dryRun: true. Parameters

Parameter	Type	Default	Description
`x`	`number`	(required)	Target X coordinate.
`y`	`number`	(required)	Target Y coordinate.
`button`	`"left" \| "right" \| "middle"`	`"left"`	Mouse button to click.
`clickCount`	`number`	`1`	Number of clicks (1–3). Use 2 for double-click.
`purpose`	`string`	(required)	Short operational purpose.
`expectedAction`	`string`	(required)	Expected UI effect.
`dryRun`	`boolean`	`true`	Set `false` to execute.
`confirm`	`boolean`	`false`	Required when `dryRun: false`.
(guard params)	—	—	Optional window and screen guards.

7. `desktop_key_press`

Attribute	Value
Required scope	`mcp:desktop`
Policy mode	`operate`

Sends a SendKeys key string to the active Windows application. Use this for special keys and escape sequences (e.g. "{ENTER}", "{TAB}", "{F5}"). For simple printable text, prefer desktop_text_type. Defaults to dryRun: true. Parameters

Parameter	Type	Description
`key`	`string`	A `SendKeys`-compatible key string (max 80 characters).
`purpose`	`string`	Short operational purpose.
`expectedAction`	`string`	Expected UI effect.
`dryRun`	`boolean`	Default `true`. Set `false` to execute.
`confirm`	`boolean`	Required when `dryRun: false`.
(guard params)	—	Optional window and screen guards.

8. `desktop_hotkey`

Attribute	Value
Required scope	`mcp:desktop`
Policy mode	`operate`

Sends a structured modifier+key hotkey combination to the active application (e.g. Ctrl+L, Ctrl+Shift+P, Alt+F4). Keys are specified as an array — the last element is the key, all preceding elements are modifiers (CTRL, ALT, SHIFT). Defaults to dryRun: true. Parameters

Parameter	Type	Description
`keys`	`string[]`	Array of 2–4 key names. Modifiers: `CTRL`/`CONTROL`, `ALT`, `SHIFT`. Final element: the key name.
`purpose`	`string`	Short operational purpose.
`expectedAction`	`string`	Expected UI effect.
`dryRun`	`boolean`	Default `true`. Set `false` to execute.
`confirm`	`boolean`	Required when `dryRun: false`.
(guard params)	—	Optional window and screen guards.

Example: to press Ctrl+Shift+P in VS Code:

{
  "tool": "desktop_hotkey",
  "arguments": {
    "keys": ["CTRL", "SHIFT", "P"],
    "purpose": "Open VS Code command palette",
    "expectedAction": "Command palette opens",
    "dryRun": false,
    "confirm": true,
    "expectedProcessName": "Code"
  }
}

9. `desktop_text_type`

Attribute	Value
Required scope	`mcp:desktop`
Policy mode	`operate`

Types a string of text into the active Windows application using SendKeys. Special SendKeys characters (+, ^, %, ~, (, ), [, ], {, }) are automatically escaped. Text content is logged as [REDACTED] with length in the audit journal. Defaults to dryRun: true. Parameters

Parameter	Type	Description
`text`	`string`	Text to type (1–10,000 characters). Sensitive characters are escaped automatically.
`purpose`	`string`	Short operational purpose.
`expectedAction`	`string`	Expected UI effect.
`dryRun`	`boolean`	Default `true`. Set `false` to execute.
`confirm`	`boolean`	Required when `dryRun: false`.
(guard params)	—	Optional window and screen guards.

Practical use case: verify a UI state after a build

After starting a desktop application via start_process, you can use screen and desktop tools to confirm it launched correctly and interact with it:

// 1. List windows to find the app
{ "tool": "window_list", "arguments": { "includeBounds": true } }

// 2. Take a screenshot to see the current state
{ "tool": "screen_screenshot", "arguments": { "mode": "primary" } }

// 3. Run OCR to extract text from the screenshot
{
  "tool": "screen_ocr",
  "arguments": { "screenshotId": "<id from step 2>", "language": "eng" }
}

// 4. If a dialog needs dismissal, click it
{
  "tool": "desktop_mouse_click",
  "arguments": {
    "x": 640, "y": 400,
    "purpose": "Dismiss startup dialog",
    "expectedAction": "Dialog closes",
    "dryRun": false,
    "confirm": true,
    "expectedProcessName": "MyApp"
  }
}

Overview

Tools

Screen and Desktop Tools: Screenshots, OCR, Mouse and Keyboard

Screen tools (`mcp:screen` scope)

1. `window_list`

2. `screen_screenshot`

3. `screen_ocr`

Desktop tools (`mcp:desktop` scope)

4. `desktop_mouse_position`

5. `desktop_mouse_move`

6. `desktop_mouse_click`

7. `desktop_key_press`

8. `desktop_hotkey`

9. `desktop_text_type`

Practical use case: verify a UI state after a build

See also

Build docs developers (and LLMs) love

Overview

Tools

Documentation Index

​Screen tools (mcp:screen scope)

​1. window_list

​2. screen_screenshot

​3. screen_ocr

​Desktop tools (mcp:desktop scope)

​4. desktop_mouse_position

​5. desktop_mouse_move

​6. desktop_mouse_click

​7. desktop_key_press

​8. desktop_hotkey

​9. desktop_text_type

​Practical use case: verify a UI state after a build

​See also

Build docs developers (and LLMs) love

Screen tools (`mcp:screen` scope)

1. `window_list`

2. `screen_screenshot`

3. `screen_ocr`

Desktop tools (`mcp:desktop` scope)

4. `desktop_mouse_position`

5. `desktop_mouse_move`

6. `desktop_mouse_click`

7. `desktop_key_press`

8. `desktop_hotkey`

9. `desktop_text_type`

Practical use case: verify a UI state after a build

See also