Core capabilities
Libretto is organized around four capabilities that cover the full lifecycle of a browser automation: Inspect live pages — Take a snapshot of any open page and let a vision model extract selectors, identify interactive elements, and summarize the visible state. Snapshot analysis runs in a separate process so the results are token-efficient summaries, not raw DOM dumps. Capture network traffic — Record every HTTP request and response made by the browser. Libretto writes these to a structured JSONL log you can query withjq or pass directly to your agent to reverse-engineer the site’s API.
Record user actions — As you interact with a page manually, Libretto captures each DOM event with a semantic selector, nearby text, and coordinates. Your agent can read these recorded actions and reconstruct the workflow as a typed Playwright script.
Debug broken workflows — When a workflow fails, Libretto keeps the browser open. Your agent can inspect the live page state with snapshot and exec, identify the broken selector or unexpected page change, patch the code, and re-run — all without restarting from scratch.
The skill concept
Libretto is designed to be loaded as a skill in your coding agent. A skill is a set of instructions that tells your agent when and how to use a tool. When you give your agent the Libretto skill, it knows to:- Open a browser before guessing at page structure
- Use
snapshotas the primary observation tool instead of reading raw HTML - Prefer network request approaches over UI automation when the site allows it
- Validate a finished workflow with a headless
runbefore declaring it done
skills/libretto/SKILL.md and is automatically copied into .agents/skills/libretto and .claude/skills/libretto when you run npx libretto init.
You can invoke Libretto skills with natural language prompts like:
Use the Libretto skill. Go to LinkedIn and scrape the first 10 posts — content, author, reaction count, and first 25 comments.
I’ll show you a workflow in eClinicalWorks to get a patient’s primary insurance ID. Use the Libretto skill to turn it into a Playwright script.
We have a browser script at ./integration.ts that’s throwing a broken selector error. Fix it. Use the Libretto skill.
Architecture overview
CLI vs Library API
Libretto ships two interfaces:-
CLI (
npx libretto <command>) — the primary interface for both agents and humans. Commands open browsers, take snapshots, execute Playwright code, run workflow files, and manage sessions. Every command accepts--session <name>to target a specific browser session. -
Library API (
import { workflow } from "libretto") — used inside workflow files you want to run withnpx libretto run. Theworkflow()function wraps your automation handler and gives it typed access topage,session,logger, and optional application services.
Sessions
A session is a named browser context. When you runnpx libretto open https://example.com --session my-session, Libretto launches a Chromium instance and registers it under that name. Subsequent commands (snapshot, exec, network, actions) all target that session by name.
If you don’t pass --session, Libretto uses a default session name. Sessions are independent — you can have multiple sessions open at the same time, each pointing to a different browser or URL.
The .libretto/ directory
All Libretto state lives in a .libretto/ directory at your project root:
.libretto/.gitignore file that npx libretto init creates.
Next steps
Quick start
Install Libretto and run your first browser automation in minutes.
CLI reference
Full reference for every Libretto command and flag.
Library API
Write workflow files with the typed
workflow() API.Guides
End-to-end walkthroughs: scraping, network capture, debugging.