Documentation Index
Fetch the complete documentation index at: https://mintlify.com/XxYouDeaDPunKxX/chatgpt-local-agent-mcp/llms.txt
Use this file to discover all available pages before exploring further.
Screen and desktop tools allow ChatGPT to observe and interact with the Windows desktop — useful for UI automation scenarios and visual debugging when browser or structured tools cannot reach the target application. Screen tools (mcp:screen scope) are read-only observers: they capture what is visible on screen. Desktop tools (mcp:desktop scope) are actuators: they move the mouse, click, and type into whatever the foreground window happens to be.
All screen and desktop tools require Windows. They use PowerShell with System.Windows.Forms, System.Drawing, Win32 P/Invoke, and SendKeys internally — no third-party drivers are required.
Desktop tools move the mouse cursor and inject keystrokes into the active application. Do not run the MCP server under an account you are not willing to expose to this level of automation. Desktop actions are journaled but not reversible.
1. window_list
| Attribute | Value |
|---|
| Required scope | mcp:screen |
| Policy mode | observe |
| Risk tags | screen-disclosure, window-title-disclosure |
Preferred desktop-observation tool before falling back to coordinate-level actions. Lists all visible top-level desktop windows with their process name, PID, window handle, and optionally their bounding rectangle. Window titles are redacted by default — they may contain file paths, URLs, or other sensitive information.
Parameters
| Parameter | Type | Default | Description |
|---|
includeBounds | boolean | true | Include the bounding rectangle (x, y, width, height) for each window. |
maxWindows | number | 100 | Maximum number of windows to return (max 500). Truncation indicated by truncated. |
raw | boolean | false | Return unredacted window titles (requires confirm: true). |
confirm | boolean | false | Required when raw: true. |
Response fields: windows (array of { pid, processName, title, handle, bounds? }), platform, truncated, raw, redacted.
Use window_list to identify the processName and bounds of the target window before using desktop mouse/keyboard tools. This lets you pass expectedProcessName as a guard to prevent accidental actions on the wrong application.
2. screen_screenshot
| Attribute | Value |
|---|
| Required scope | mcp:screen |
| Policy mode | diagnose |
| Risk tags | screen-disclosure, secret-read |
Visual fallback for desktop inspection. Captures a PNG screenshot of the primary screen, all screens, or an explicit pixel region. Prefer window_list, browser_snapshot, or structured tools when the information you need is available in a structured form; use this tool when you need to see what is actually rendered on screen.
Screenshots are saved to the server’s data directory and automatically pruned when the file count or byte budget is exceeded.
Parameters
| Parameter | Type | Default | Description |
|---|
mode | "primary" | "all_screens" | "region" | "primary" | Which area to capture. |
x | number | — | Left edge of region in screen coordinates. Required when mode is "region". |
y | number | — | Top edge of region. Required when mode is "region". |
width | number | — | Width of region in pixels. Required when mode is "region". |
height | number | — | Height of region in pixels. Required when mode is "region". |
allowFullDesktop | boolean | false | Required when mode is "all_screens". |
confirm | boolean | false | Required when mode is "all_screens". |
includeImageBase64 | boolean | false | Include the PNG as a base64 string in the response (subject to output size limits). |
Response fields: screenshotId, path, hash, size, bounds ({ x, y, width, height }), cleanup ({ deleted, deletedBytes }), sourceTrust ("screen_observed_content"), imageBase64?.
Screenshot size limits
| Limit variable | Default | Description |
|---|
GPT_FS_MCP_MAX_SCREENSHOT_AREA_PIXELS | 33,000,000 | Maximum pixel area of a single screenshot. |
GPT_FS_MCP_MAX_SCREENSHOT_BYTES | 100,000,000 | Maximum file size in bytes per screenshot. |
GPT_FS_MCP_MAX_SCREENSHOT_DIMENSION | 8192 | Maximum width or height dimension in pixels. |
GPT_FS_MCP_MAX_SCREENSHOT_FILES | 100 | Maximum number of screenshot files retained. |
3. screen_ocr
| Attribute | Value |
|---|
| Required scope | mcp:screen |
| Policy mode | diagnose |
| Risk tags | screen-disclosure, secret-read, ocr |
Runs Tesseract OCR on a screenshot captured by screen_screenshot. Returns the extracted text with sensitive patterns (email addresses, Bearer tokens, API_KEY=, SECRET=, etc.) redacted by default. OCR is best-effort and requires the tesseract executable to be installed and available on PATH.
screen_ocr and screen_screenshot can capture any information visible on screen, including passwords typed in other applications, authentication tokens displayed in terminals, and confidential documents. Both tools tag their output with sourceTrust: "screen_observed_content" to signal that this content originates from an untrusted visual source.
Parameters
| Parameter | Type | Default | Description |
|---|
screenshotId | string (UUID) | (required) | The screenshotId from a prior screen_screenshot call. |
language | string | "eng" | Tesseract language code (e.g. "eng", "deu", "fra"). |
psm | number | 6 | Tesseract page segmentation mode (0–13). Mode 6 assumes a uniform block of text. |
redact | boolean | true | Apply pattern-based redaction to the OCR output. |
raw | boolean | false | Return unredacted output. Requires confirm: true. |
confirm | boolean | false | Required when raw: true or redact: false. |
Response fields: text, available (whether Tesseract was found), language, psm, path, redacted, error?, stderr?, sourceTrust.
All desktop tools default to dryRun: true. A dry run returns the current desktop state (mouse position, active window, screen bounds) and what would have been executed, without performing any action. Set dryRun: false and confirm: true to execute.
All operate-mode desktop tools include guard parameters that let you assert the expected screen dimensions and active window before the action fires. These guards help prevent misfire when the screen layout or foreground application has changed between planning and execution.
Common guard parameters (all desktop operate tools)
| Parameter | Type | Description |
|---|
expectedProcessName | string | Reject if the active window’s process name does not match (case-insensitive). |
expectedWindowTitle | string | Reject if the active window’s title does not contain this substring (case-insensitive). |
expectedScreenWidth | number | Reject if the virtual screen width does not match exactly. |
expectedScreenHeight | number | Reject if the virtual screen height does not match exactly. |
4. desktop_mouse_position
| Attribute | Value |
|---|
| Required scope | mcp:desktop |
| Policy mode | observe |
Returns the current mouse cursor position, the primary and virtual screen bounds, and the active foreground window. Use this as the starting point before planning coordinate-based actions.
Parameters: none.
Response fields: mousePosition ({ x, y }), primaryBounds ({ x, y, width, height }), virtualBounds, activeWindow? ({ handle, pid, processName, title, bounds? }).
5. desktop_mouse_move
| Attribute | Value |
|---|
| Required scope | mcp:desktop |
| Policy mode | operate |
Moves the Windows mouse cursor to absolute screen coordinates. The target coordinate is validated against the virtual screen bounds before execution. Defaults to dryRun: true.
Parameters
| Parameter | Type | Description |
|---|
x | number | Target X coordinate in screen pixels (absolute). |
y | number | Target Y coordinate in screen pixels (absolute). |
purpose | string | Short operational purpose for using desktop UI fallback. |
expectedAction | string | Expected UI effect of this mouse move. |
dryRun | boolean | Default true. Set false to execute. |
confirm | boolean | Required when dryRun: false. |
| (guard params) | — | expectedProcessName, expectedWindowTitle, etc. |
6. desktop_mouse_click
| Attribute | Value |
|---|
| Required scope | mcp:desktop |
| Policy mode | operate |
Moves the mouse to absolute screen coordinates and fires one or more mouse button events. Uses Win32 mouse_event P/Invoke. Defaults to dryRun: true.
Parameters
| Parameter | Type | Default | Description |
|---|
x | number | (required) | Target X coordinate. |
y | number | (required) | Target Y coordinate. |
button | "left" | "right" | "middle" | "left" | Mouse button to click. |
clickCount | number | 1 | Number of clicks (1–3). Use 2 for double-click. |
purpose | string | (required) | Short operational purpose. |
expectedAction | string | (required) | Expected UI effect. |
dryRun | boolean | true | Set false to execute. |
confirm | boolean | false | Required when dryRun: false. |
| (guard params) | — | — | Optional window and screen guards. |
7. desktop_key_press
| Attribute | Value |
|---|
| Required scope | mcp:desktop |
| Policy mode | operate |
Sends a SendKeys key string to the active Windows application. Use this for special keys and escape sequences (e.g. "{ENTER}", "{TAB}", "{F5}"). For simple printable text, prefer desktop_text_type. Defaults to dryRun: true.
Parameters
| Parameter | Type | Description |
|---|
key | string | A SendKeys-compatible key string (max 80 characters). |
purpose | string | Short operational purpose. |
expectedAction | string | Expected UI effect. |
dryRun | boolean | Default true. Set false to execute. |
confirm | boolean | Required when dryRun: false. |
| (guard params) | — | Optional window and screen guards. |
8. desktop_hotkey
| Attribute | Value |
|---|
| Required scope | mcp:desktop |
| Policy mode | operate |
Sends a structured modifier+key hotkey combination to the active application (e.g. Ctrl+L, Ctrl+Shift+P, Alt+F4). Keys are specified as an array — the last element is the key, all preceding elements are modifiers (CTRL, ALT, SHIFT). Defaults to dryRun: true.
Parameters
| Parameter | Type | Description |
|---|
keys | string[] | Array of 2–4 key names. Modifiers: CTRL/CONTROL, ALT, SHIFT. Final element: the key name. |
purpose | string | Short operational purpose. |
expectedAction | string | Expected UI effect. |
dryRun | boolean | Default true. Set false to execute. |
confirm | boolean | Required when dryRun: false. |
| (guard params) | — | Optional window and screen guards. |
Example: to press Ctrl+Shift+P in VS Code:
{
"tool": "desktop_hotkey",
"arguments": {
"keys": ["CTRL", "SHIFT", "P"],
"purpose": "Open VS Code command palette",
"expectedAction": "Command palette opens",
"dryRun": false,
"confirm": true,
"expectedProcessName": "Code"
}
}
9. desktop_text_type
| Attribute | Value |
|---|
| Required scope | mcp:desktop |
| Policy mode | operate |
Types a string of text into the active Windows application using SendKeys. Special SendKeys characters (+, ^, %, ~, (, ), [, ], {, }) are automatically escaped. Text content is logged as [REDACTED] with length in the audit journal. Defaults to dryRun: true.
Parameters
| Parameter | Type | Description |
|---|
text | string | Text to type (1–10,000 characters). Sensitive characters are escaped automatically. |
purpose | string | Short operational purpose. |
expectedAction | string | Expected UI effect. |
dryRun | boolean | Default true. Set false to execute. |
confirm | boolean | Required when dryRun: false. |
| (guard params) | — | Optional window and screen guards. |
Practical use case: verify a UI state after a build
After starting a desktop application via start_process, you can use screen and desktop tools to confirm it launched correctly and interact with it:
// 1. List windows to find the app
{ "tool": "window_list", "arguments": { "includeBounds": true } }
// 2. Take a screenshot to see the current state
{ "tool": "screen_screenshot", "arguments": { "mode": "primary" } }
// 3. Run OCR to extract text from the screenshot
{
"tool": "screen_ocr",
"arguments": { "screenshotId": "<id from step 2>", "language": "eng" }
}
// 4. If a dialog needs dismissal, click it
{
"tool": "desktop_mouse_click",
"arguments": {
"x": 640, "y": 400,
"purpose": "Dismiss startup dialog",
"expectedAction": "Dialog closes",
"dryRun": false,
"confirm": true,
"expectedProcessName": "MyApp"
}
}
See also
- Security Boundaries — scope enforcement, audit logging, and risk model for screen and desktop tool access