Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/XxYouDeaDPunKxX/chatgpt-local-agent-mcp/llms.txt

Use this file to discover all available pages before exploring further.

Screen and desktop tools allow ChatGPT to observe and interact with the Windows desktop — useful for UI automation scenarios and visual debugging when browser or structured tools cannot reach the target application. Screen tools (mcp:screen scope) are read-only observers: they capture what is visible on screen. Desktop tools (mcp:desktop scope) are actuators: they move the mouse, click, and type into whatever the foreground window happens to be. All screen and desktop tools require Windows. They use PowerShell with System.Windows.Forms, System.Drawing, Win32 P/Invoke, and SendKeys internally — no third-party drivers are required.
Desktop tools move the mouse cursor and inject keystrokes into the active application. Do not run the MCP server under an account you are not willing to expose to this level of automation. Desktop actions are journaled but not reversible.

Screen tools (mcp:screen scope)

1. window_list

AttributeValue
Required scopemcp:screen
Policy modeobserve
Risk tagsscreen-disclosure, window-title-disclosure
Preferred desktop-observation tool before falling back to coordinate-level actions. Lists all visible top-level desktop windows with their process name, PID, window handle, and optionally their bounding rectangle. Window titles are redacted by default — they may contain file paths, URLs, or other sensitive information. Parameters
ParameterTypeDefaultDescription
includeBoundsbooleantrueInclude the bounding rectangle (x, y, width, height) for each window.
maxWindowsnumber100Maximum number of windows to return (max 500). Truncation indicated by truncated.
rawbooleanfalseReturn unredacted window titles (requires confirm: true).
confirmbooleanfalseRequired when raw: true.
Response fields: windows (array of { pid, processName, title, handle, bounds? }), platform, truncated, raw, redacted.
Use window_list to identify the processName and bounds of the target window before using desktop mouse/keyboard tools. This lets you pass expectedProcessName as a guard to prevent accidental actions on the wrong application.

2. screen_screenshot

AttributeValue
Required scopemcp:screen
Policy modediagnose
Risk tagsscreen-disclosure, secret-read
Visual fallback for desktop inspection. Captures a PNG screenshot of the primary screen, all screens, or an explicit pixel region. Prefer window_list, browser_snapshot, or structured tools when the information you need is available in a structured form; use this tool when you need to see what is actually rendered on screen. Screenshots are saved to the server’s data directory and automatically pruned when the file count or byte budget is exceeded. Parameters
ParameterTypeDefaultDescription
mode"primary" | "all_screens" | "region""primary"Which area to capture.
xnumberLeft edge of region in screen coordinates. Required when mode is "region".
ynumberTop edge of region. Required when mode is "region".
widthnumberWidth of region in pixels. Required when mode is "region".
heightnumberHeight of region in pixels. Required when mode is "region".
allowFullDesktopbooleanfalseRequired when mode is "all_screens".
confirmbooleanfalseRequired when mode is "all_screens".
includeImageBase64booleanfalseInclude the PNG as a base64 string in the response (subject to output size limits).
Response fields: screenshotId, path, hash, size, bounds ({ x, y, width, height }), cleanup ({ deleted, deletedBytes }), sourceTrust ("screen_observed_content"), imageBase64?. Screenshot size limits
Limit variableDefaultDescription
GPT_FS_MCP_MAX_SCREENSHOT_AREA_PIXELS33,000,000Maximum pixel area of a single screenshot.
GPT_FS_MCP_MAX_SCREENSHOT_BYTES100,000,000Maximum file size in bytes per screenshot.
GPT_FS_MCP_MAX_SCREENSHOT_DIMENSION8192Maximum width or height dimension in pixels.
GPT_FS_MCP_MAX_SCREENSHOT_FILES100Maximum number of screenshot files retained.

3. screen_ocr

AttributeValue
Required scopemcp:screen
Policy modediagnose
Risk tagsscreen-disclosure, secret-read, ocr
Runs Tesseract OCR on a screenshot captured by screen_screenshot. Returns the extracted text with sensitive patterns (email addresses, Bearer tokens, API_KEY=, SECRET=, etc.) redacted by default. OCR is best-effort and requires the tesseract executable to be installed and available on PATH.
screen_ocr and screen_screenshot can capture any information visible on screen, including passwords typed in other applications, authentication tokens displayed in terminals, and confidential documents. Both tools tag their output with sourceTrust: "screen_observed_content" to signal that this content originates from an untrusted visual source.
Parameters
ParameterTypeDefaultDescription
screenshotIdstring (UUID)(required)The screenshotId from a prior screen_screenshot call.
languagestring"eng"Tesseract language code (e.g. "eng", "deu", "fra").
psmnumber6Tesseract page segmentation mode (0–13). Mode 6 assumes a uniform block of text.
redactbooleantrueApply pattern-based redaction to the OCR output.
rawbooleanfalseReturn unredacted output. Requires confirm: true.
confirmbooleanfalseRequired when raw: true or redact: false.
Response fields: text, available (whether Tesseract was found), language, psm, path, redacted, error?, stderr?, sourceTrust.

Desktop tools (mcp:desktop scope)

All desktop tools default to dryRun: true. A dry run returns the current desktop state (mouse position, active window, screen bounds) and what would have been executed, without performing any action. Set dryRun: false and confirm: true to execute. All operate-mode desktop tools include guard parameters that let you assert the expected screen dimensions and active window before the action fires. These guards help prevent misfire when the screen layout or foreground application has changed between planning and execution. Common guard parameters (all desktop operate tools)
ParameterTypeDescription
expectedProcessNamestringReject if the active window’s process name does not match (case-insensitive).
expectedWindowTitlestringReject if the active window’s title does not contain this substring (case-insensitive).
expectedScreenWidthnumberReject if the virtual screen width does not match exactly.
expectedScreenHeightnumberReject if the virtual screen height does not match exactly.

4. desktop_mouse_position

AttributeValue
Required scopemcp:desktop
Policy modeobserve
Returns the current mouse cursor position, the primary and virtual screen bounds, and the active foreground window. Use this as the starting point before planning coordinate-based actions. Parameters: none. Response fields: mousePosition ({ x, y }), primaryBounds ({ x, y, width, height }), virtualBounds, activeWindow? ({ handle, pid, processName, title, bounds? }).

5. desktop_mouse_move

AttributeValue
Required scopemcp:desktop
Policy modeoperate
Moves the Windows mouse cursor to absolute screen coordinates. The target coordinate is validated against the virtual screen bounds before execution. Defaults to dryRun: true. Parameters
ParameterTypeDescription
xnumberTarget X coordinate in screen pixels (absolute).
ynumberTarget Y coordinate in screen pixels (absolute).
purposestringShort operational purpose for using desktop UI fallback.
expectedActionstringExpected UI effect of this mouse move.
dryRunbooleanDefault true. Set false to execute.
confirmbooleanRequired when dryRun: false.
(guard params)expectedProcessName, expectedWindowTitle, etc.

6. desktop_mouse_click

AttributeValue
Required scopemcp:desktop
Policy modeoperate
Moves the mouse to absolute screen coordinates and fires one or more mouse button events. Uses Win32 mouse_event P/Invoke. Defaults to dryRun: true. Parameters
ParameterTypeDefaultDescription
xnumber(required)Target X coordinate.
ynumber(required)Target Y coordinate.
button"left" | "right" | "middle""left"Mouse button to click.
clickCountnumber1Number of clicks (1–3). Use 2 for double-click.
purposestring(required)Short operational purpose.
expectedActionstring(required)Expected UI effect.
dryRunbooleantrueSet false to execute.
confirmbooleanfalseRequired when dryRun: false.
(guard params)Optional window and screen guards.

7. desktop_key_press

AttributeValue
Required scopemcp:desktop
Policy modeoperate
Sends a SendKeys key string to the active Windows application. Use this for special keys and escape sequences (e.g. "{ENTER}", "{TAB}", "{F5}"). For simple printable text, prefer desktop_text_type. Defaults to dryRun: true. Parameters
ParameterTypeDescription
keystringA SendKeys-compatible key string (max 80 characters).
purposestringShort operational purpose.
expectedActionstringExpected UI effect.
dryRunbooleanDefault true. Set false to execute.
confirmbooleanRequired when dryRun: false.
(guard params)Optional window and screen guards.

8. desktop_hotkey

AttributeValue
Required scopemcp:desktop
Policy modeoperate
Sends a structured modifier+key hotkey combination to the active application (e.g. Ctrl+L, Ctrl+Shift+P, Alt+F4). Keys are specified as an array — the last element is the key, all preceding elements are modifiers (CTRL, ALT, SHIFT). Defaults to dryRun: true. Parameters
ParameterTypeDescription
keysstring[]Array of 2–4 key names. Modifiers: CTRL/CONTROL, ALT, SHIFT. Final element: the key name.
purposestringShort operational purpose.
expectedActionstringExpected UI effect.
dryRunbooleanDefault true. Set false to execute.
confirmbooleanRequired when dryRun: false.
(guard params)Optional window and screen guards.
Example: to press Ctrl+Shift+P in VS Code:
{
  "tool": "desktop_hotkey",
  "arguments": {
    "keys": ["CTRL", "SHIFT", "P"],
    "purpose": "Open VS Code command palette",
    "expectedAction": "Command palette opens",
    "dryRun": false,
    "confirm": true,
    "expectedProcessName": "Code"
  }
}

9. desktop_text_type

AttributeValue
Required scopemcp:desktop
Policy modeoperate
Types a string of text into the active Windows application using SendKeys. Special SendKeys characters (+, ^, %, ~, (, ), [, ], {, }) are automatically escaped. Text content is logged as [REDACTED] with length in the audit journal. Defaults to dryRun: true. Parameters
ParameterTypeDescription
textstringText to type (1–10,000 characters). Sensitive characters are escaped automatically.
purposestringShort operational purpose.
expectedActionstringExpected UI effect.
dryRunbooleanDefault true. Set false to execute.
confirmbooleanRequired when dryRun: false.
(guard params)Optional window and screen guards.

Practical use case: verify a UI state after a build

After starting a desktop application via start_process, you can use screen and desktop tools to confirm it launched correctly and interact with it:
// 1. List windows to find the app
{ "tool": "window_list", "arguments": { "includeBounds": true } }

// 2. Take a screenshot to see the current state
{ "tool": "screen_screenshot", "arguments": { "mode": "primary" } }

// 3. Run OCR to extract text from the screenshot
{
  "tool": "screen_ocr",
  "arguments": { "screenshotId": "<id from step 2>", "language": "eng" }
}

// 4. If a dialog needs dismissal, click it
{
  "tool": "desktop_mouse_click",
  "arguments": {
    "x": 640, "y": 400,
    "purpose": "Dismiss startup dialog",
    "expectedAction": "Dialog closes",
    "dryRun": false,
    "confirm": true,
    "expectedProcessName": "MyApp"
  }
}

See also

  • Security Boundaries — scope enforcement, audit logging, and risk model for screen and desktop tool access

Build docs developers (and LLMs) love