Accessibility-first
The core idea: read the accessibility tree before you tap anything. In web automation, Playwright’sbrowser_snapshot gives you a structured view of the DOM — roles, labels, positions — without requiring a screenshot. Preflight’s simulator_snapshot does the same for iOS.
When you use simulator_snapshot, you get:
- Every visible UI element with its role (
UIButton,UILabel,UITextField, etc.) - Accessibility labels and values
- Screen coordinates for each element
- The full hierarchy — no vision model required
The recommended workflow
Snapshot the screen
Call
simulator_snapshot to read the accessibility tree. This tells you what elements are on screen, their labels, and their positions — with no vision model needed.Interact using coordinates from the snapshot
Use the positions reported by the snapshot to call
simulator_tap, simulator_swipe, or simulator_long_press. You know exactly where to tap without guessing.Screenshot for visual verification
When you need to verify how something looks — a color change, an image, a layout — call
simulator_screenshot. The image returns inline in chat, compressed for minimal token usage.Touch injection pipeline
Preflight injects touch events without moving your Mac cursor. Whensimulator_tap is called, it follows this path:
Install idb for the best experience:
brew tap facebook/fb && brew install idb-companion && pip3 install fb-idb. See idb setup for details.No disk clutter
Preflight never writes files to disk unless you explicitly ask it to.- Screenshots return as base64-encoded JPEG data inline in the chat response — no files saved to your Desktop or Downloads folder.
- Video recordings extract key frames as inline images when you call
simulator_stop_recording. The raw video file is deleted unless you pass an optionalsavePath.
AI token optimization
Preflight compresses screenshots to JPEG at a quality level optimized for AI consumption — typically 200–400 KB per image. This is small enough to keep token usage manageable across long sessions while retaining enough detail for visual verification. For video, extracting key frames instead of returning the full video file serves two purposes:- Most AI models cannot process video files directly.
- Key frames use far fewer tokens than a raw video stream.
Comparison to Playwright MCP
| Concept | Playwright MCP (web) | Preflight MCP (iOS) |
|---|---|---|
| Structured snapshot | browser_snapshot | simulator_snapshot |
| Wait for element | browser_wait_for | simulator_wait_for_element |
| Screenshot | browser_screenshot | simulator_screenshot |
| Touch / click | browser_click | simulator_tap |
| Type text | browser_type | simulator_type_text |
| Navigate back | browser_navigate_back | simulator_navigate_back |
Example prompts
These prompts show the philosophy in action: Accessibility-first workflow“Take a snapshot of the current screen to see what elements are available. Then tap the button labeled ‘Sign In’ and wait for the email text field to appear.”QA testing session
“Boot the iPhone 16 Pro simulator, install my app atDebugging a crash./build/MyApp.app, launch it, and take a screenshot of the home screen. Then tap the login button, typetest@email.comin the email field, and verify the form validation works.”
“My app is crashing on launch. Check the crash logs for MyApp, then get the last 5 minutes of device logs filtered to the MyApp process.”Visual regression
“Switch to dark mode, take a screenshot, then switch to light mode and screenshot again.”
