Documentation Index
Fetch the complete documentation index at: https://mintlify.com/steerlabs/opensteer/llms.txt
Use this file to discover all available pages before exploring further.
Computer Use Agent (CUA)
OpenSteer supports Computer Use Agents (CUA) that can autonomously control browsers by interpreting screenshots and executing actions. The agent workflow enables AI models to complete complex browser tasks through natural language instructions.Supported Providers
OpenSteer supports CUA from three major providers:- OpenAI - Computer Use Preview models
- Anthropic - Claude models with computer use capabilities
- Google - Gemini models with multimodal capabilities
Agent Workflow
- Initialization: Configure the agent with a model, system prompt, and execution parameters
- Screenshot Capture: Agent captures the current browser viewport as a PNG screenshot
- Reasoning: AI model analyzes the screenshot and decides on the next action
- Action Execution: Agent executes browser actions (click, input, scroll, navigation, etc.)
- Iteration: Process repeats until task completion or max steps reached
- Result: Returns success status, completion flag, message, actions taken, and usage metrics
Agent Capabilities
The CUA can perform various browser actions:- Click - Click at specific coordinates or with modifiers
- Type - Enter text into inputs
- Scroll - Scroll viewport or specific elements
- Navigate - Go to URLs
- Wait - Pause execution
- Screenshot - Capture viewport state
- Finish - Complete task execution
When to Use Agents
Use Agents When:
- Automating complex multi-step workflows
- Handling dynamic UIs that change frequently
- Exploring unfamiliar websites or applications
- Tasks require visual interpretation (images, layouts, colors)
- You want natural language task definitions
Use Direct Actions When:
- You know exact selectors or element paths
- Performance is critical (agents are slower)
- Deterministic execution is required
- Working with well-structured, stable UIs
- Cost optimization is important (agents use more tokens)
Configuration
Agents are configured through theagent() method with mode, model, and optional parameters:
Execution
Execute agent tasks with theexecute() method:
Model Format
Models are specified using theprovider/model format:
Error Handling
Agent execution can fail for various reasons:- Model API errors (rate limits, authentication)
- Invalid actions or coordinates
- Max steps reached before completion
- Page navigation failures
result.success and handle failures appropriately: