Computer Use Agent (CUA)

Overview

OpenSteer’s Computer Use Agent (CUA) allows you to control browsers using natural language instructions. The agent uses vision-language models to understand page content and execute actions autonomously.

Supported Providers

OpenSteer CUA supports three major AI providers:

OpenAI

openai/computer-use-preview

Anthropic

Claude models with computer use

Google

Gemini models with computer use

Quick Start

Create and execute a CUA agent:

import { Opensteer } from 'opensteer'

const opensteer = new Opensteer({
  model: 'openai/computer-use-preview'
})

try {
  await opensteer.launch()
  
  const agent = opensteer.agent({ mode: 'cua' })
  
  const result = await agent.execute({
    instruction: 'Go to Hacker News and open the top story.',
    maxSteps: 20
  })
  
  console.log(result.message)
  console.log('Completed:', result.completed)
} finally {
  await opensteer.close()
}

Creating an Agent

Basic Agent Creation

const agent = opensteer.agent({ mode: 'cua' })

Agent with Custom Model

const opensteer = new Opensteer({
  model: 'openai/computer-use-preview'
})

const agent = opensteer.agent({
  mode: 'cua',
  model: 'openai/gpt-5'
})

Agent with Full Model Configuration

const agent = opensteer.agent({
  mode: 'cua',
  model: {
    modelName: 'computer-use-preview',
    apiKey: process.env.OPENAI_API_KEY,
    baseUrl: 'https://api.openai.com/v1',
    organization: 'org-123',
    thinkingBudget: 10000
  }
})

Custom System Prompt

const agent = opensteer.agent({
  mode: 'cua',
  systemPrompt: 'You are a helpful shopping assistant. Find the best deals and compare prices.'
})

Wait Between Actions

const agent = opensteer.agent({
  mode: 'cua',
  waitBetweenActionsMs: 1000  // Wait 1 second between actions
})

Model Configuration Options

modelName

string

required

The name of the model to use (e.g., computer-use-preview, claude-3-5-sonnet-20241022)

apiKey

string

API key for the provider. Falls back to environment variables if not provided.

baseUrl

string

Base URL for the API endpoint. Useful for custom endpoints or proxies.

organization

string

Organization ID (OpenAI only)

thinkingBudget

number

Token budget for extended thinking (if supported by model)

environment

string

Environment identifier (Google only)

Executing Instructions

Simple Execution

const result = await agent.execute('Go to Google and search for OpenSteer')

Execution with Options

const result = await agent.execute({
  instruction: 'Find the top 3 products on this page and click the first one',
  maxSteps: 30,
  highlightCursor: true
})

Execution Options

instruction

string

required

The natural language instruction for the agent to execute

maxSteps

number

default:20

Maximum number of actions the agent can take before stopping

highlightCursor

boolean

default:false

Show a visual cursor indicator when the agent performs mouse actions

Result Structure

The execute() method returns a detailed result object:

interface OpensteerAgentResult {
  success: boolean        // Whether execution completed without errors
  completed: boolean      // Whether the task was completed
  message: string         // Final message from the agent
  actions: OpensteerAgentAction[]  // List of actions taken
  usage?: OpensteerAgentUsage      // Token and time usage
  provider: string        // Provider used (openai, anthropic, google)
  model: string          // Model used
}

Result Properties

success

boolean

Indicates if execution completed without errors. false if the agent encountered an error.

completed

boolean

Indicates if the agent completed the task. false if maxSteps was reached before completion.

message

string

Final message from the agent explaining the result or what it accomplished.

actions

OpensteerAgentAction[]

Array of all actions the agent performed (clicks, typing, scrolling, etc.)

usage

OpensteerAgentUsage

Token usage and inference time statistics

provider

string

The AI provider used: openai, anthropic, or google

model

string

The full model name that was used

Actions

Each action in the actions array contains:

interface OpensteerAgentAction {
  type: string           // Action type: click, type, scroll, etc.
  reasoning?: string     // Agent's reasoning for this action
  button?: string        // Mouse button (for clicks)
  clickCount?: number    // Number of clicks
  x?: number            // X coordinate
  y?: number            // Y coordinate
  text?: string         // Text to type
  keys?: string[]       // Keys pressed
  scrollX?: number      // Horizontal scroll amount
  scrollY?: number      // Vertical scroll amount
  timeMs?: number       // Time taken for action
  url?: string          // URL (for navigation)
  path?: Array<{ x: number; y: number }>  // Mouse movement path
}

Example Actions

const result = await agent.execute({
  instruction: 'Click the login button',
  maxSteps: 10
})

// Inspect actions taken
for (const action of result.actions) {
  console.log(`Action: ${action.type}`)
  if (action.reasoning) {
    console.log(`Reasoning: ${action.reasoning}`)
  }
  if (action.type === 'click') {
    console.log(`Clicked at (${action.x}, ${action.y})`)
  }
}

Usage Statistics

The usage object provides token and performance metrics:

interface OpensteerAgentUsage {
  inputTokens: number       // Input tokens consumed
  outputTokens: number      // Output tokens generated
  reasoningTokens?: number  // Reasoning tokens (if applicable)
  inferenceTimeMs: number   // Total inference time in milliseconds
}

Example Usage

const result = await agent.execute('Navigate to the pricing page')

if (result.usage) {
  console.log('Input tokens:', result.usage.inputTokens)
  console.log('Output tokens:', result.usage.outputTokens)
  console.log('Inference time:', result.usage.inferenceTimeMs, 'ms')
}

Provider-Specific Setup

OpenAI

const opensteer = new Opensteer({
  model: 'openai/computer-use-preview'
})

const agent = opensteer.agent({
  mode: 'cua',
  model: {
    modelName: 'computer-use-preview',
    apiKey: process.env.OPENAI_API_KEY,
    organization: 'org-123'  // Optional
  }
})

Environment variables:

OPENAI_API_KEY: Your OpenAI API key

Anthropic

const opensteer = new Opensteer({
  model: 'anthropic/claude-3-5-sonnet-20241022'
})

const agent = opensteer.agent({
  mode: 'cua',
  model: {
    modelName: 'claude-3-5-sonnet-20241022',
    apiKey: process.env.ANTHROPIC_API_KEY,
    thinkingBudget: 10000  // Extended thinking tokens
  }
})

Environment variables:

ANTHROPIC_API_KEY: Your Anthropic API key

Google

const opensteer = new Opensteer({
  model: 'google/gemini-2.0-flash-exp'
})

const agent = opensteer.agent({
  mode: 'cua',
  model: {
    modelName: 'gemini-2.0-flash-exp',
    apiKey: process.env.GOOGLE_API_KEY,
    environment: 'production'  // Optional
  }
})

Environment variables:

GOOGLE_API_KEY: Your Google API key

Complete Example

Here’s a complete example from the OpenSteer documentation:

import { Opensteer } from 'opensteer'

async function run() {
  const opensteer = new Opensteer({
    model: 'openai/computer-use-preview',
  })
  
  try {
    await opensteer.launch()
    
    const agent = opensteer.agent({
      mode: 'cua',
    })
    
    const result = await agent.execute({
      instruction: 'Go to docs and summarize the first section',
      maxSteps: 20,
      highlightCursor: true,
    })
    
    console.log('Success:', result.success)
    console.log('Completed:', result.completed)
    console.log('Message:', result.message)
    console.log('\nActions taken:', result.actions.length)
    
    if (result.usage) {
      console.log('\nToken usage:')
      console.log('  Input:', result.usage.inputTokens)
      console.log('  Output:', result.usage.outputTokens)
      console.log('  Time:', result.usage.inferenceTimeMs, 'ms')
    }
  } finally {
    await opensteer.close()
  }
}

run().catch((err) => {
  console.error(err)
  process.exit(1)
})

Cloud Mode Support

CUA agents work in both local and cloud modes:

const opensteer = new Opensteer({
  model: 'openai/computer-use-preview',
  cloud: {
    apiKey: process.env.OPENSTEER_API_KEY
  }
})

await opensteer.launch()

const agent = opensteer.agent({ mode: 'cua' })
const result = await agent.execute('Find and click the sign-up button')

CUA actions execute against the active cloud CDP page when using cloud mode.

Error Handling

try {
  const result = await agent.execute({
    instruction: 'Complete the checkout process',
    maxSteps: 50
  })
  
  if (!result.success) {
    console.error('Agent encountered an error:', result.message)
  }
  
  if (!result.completed) {
    console.warn('Agent did not complete the task within maxSteps')
    console.warn('Last action:', result.actions[result.actions.length - 1])
  }
} catch (error) {
  console.error('Execution failed:', error)
}

Best Practices

Set appropriate maxSteps

// Simple task
const result = await agent.execute({
  instruction: 'Click the login button',
  maxSteps: 5
})

// Complex task
const result = await agent.execute({
  instruction: 'Fill out the entire registration form',
  maxSteps: 30
})

Set maxSteps based on task complexity to prevent runaway executions.

Use highlightCursor for debugging

const result = await agent.execute({
  instruction: 'Navigate through the menu',
  highlightCursor: true  // Shows red cursor indicator
})

Enable cursor highlighting to visually track agent actions during development.

Provide clear instructions

// Good - specific and clear
const result = await agent.execute(
  'Go to the pricing page, find the Enterprise plan, and click the contact sales button'
)

// Bad - vague
const result = await agent.execute('do something')

Clear instructions help the agent understand and complete tasks efficiently.

Check result status

const result = await agent.execute('Complete the task')

if (result.success && result.completed) {
  console.log('Task completed successfully')
} else if (result.success && !result.completed) {
  console.log('Agent ran out of steps')
  console.log('Consider increasing maxSteps')
} else {
  console.error('Error occurred:', result.message)
}

Always check both success and completed to understand the outcome.

Customize system prompts for specific domains

const agent = opensteer.agent({
  mode: 'cua',
  systemPrompt: `You are a product research assistant.
    When finding products, always check:
    1. Price
    2. Ratings
    3. Availability
    
    Prioritize highly-rated items.`
})

Custom prompts help the agent follow domain-specific rules.

Limitations

CUA agents require launched browserYou must call await opensteer.launch() before creating and using a CUA agent.

Cloud mode restrictionsSome features like uploadFile(), exportCookies(), and importCookies() are not supported in cloud mode.

Next Steps

Browser Automation

Learn manual automation for more precise control

AI Agents

Integrate OpenSteer with AI agent workflows

Cloud Integration

Scale CUA with cloud mode

Data Extraction

Combine CUA with structured data extraction

Get Started

Core Concepts

Guides

Documentation Index

​Overview

​Supported Providers

OpenAI

Anthropic

Google

​Quick Start

​Creating an Agent

​Basic Agent Creation

​Agent with Custom Model

​Agent with Full Model Configuration

​Custom System Prompt

​Wait Between Actions

​Model Configuration Options

​Executing Instructions

​Simple Execution

​Execution with Options

​Execution Options

​Result Structure

​Result Properties

​Actions

​Example Actions

​Usage Statistics

​Example Usage

​Provider-Specific Setup

​OpenAI

​Anthropic

​Google

​Complete Example

​Cloud Mode Support

​Error Handling

​Best Practices

​Limitations

​Next Steps

Browser Automation

AI Agents

Cloud Integration

Data Extraction

Build docs developers (and LLMs) love

Overview

Supported Providers

Quick Start

Creating an Agent

Basic Agent Creation

Agent with Custom Model

Agent with Full Model Configuration

Custom System Prompt

Wait Between Actions

Model Configuration Options

Executing Instructions

Simple Execution

Execution with Options

Execution Options

Result Structure

Result Properties

Actions

Example Actions

Usage Statistics

Example Usage

Provider-Specific Setup

OpenAI

Anthropic

Google

Complete Example

Cloud Mode Support

Error Handling

Best Practices

Limitations

Next Steps