Omnigent Architecture: Server, Runners, and Harnesses

Omnigent runs as two cooperating pieces: a server that handles coordination, persistence, and the web UI, and a runner that lives on the user’s machine and executes the actual LLM loop. Separating them lets the server be deployed anywhere — a VPS, Render, Railway, Fly.io — while model credentials and local tools never leave the machine that registers as a host. Sessions stay reachable from any device, including a phone, without code or API keys ever touching the server image.

The Server

The server is a FastAPI application that acts as the coordination hub for everything Omnigent does. It exposes HTTP and SSE routes for clients (the CLI REPL, the web UI, the Python SDK), WebSocket endpoints for terminal attachment, and a WebSocket tunnel endpoint (WS /v1/runner/tunnel) that runners dial into from the user’s machine. The server stores all session state — messages, sub-agents, terminal resources, file resources, policies — in either SQLite (for single-instance or lightweight deploys) or Postgres (for production and multi-instance setups). The web UI is served as a static SPA from the same process. Because the server handles only coordination and persistence — not execution — the Docker image is deliberately small. It ships no harness SDKs, no tmux, and no LLM API keys.

The server image has no harness SDKs or API keys — they live on the runner’s machine. A deployed server can be shared with your whole team without anyone’s credentials entering the server environment.

Deployment targets for the server include:

Render / Railway

One-click deploys with managed Postgres provisioned automatically.

Docker Compose

Run on any VPS or home server with docker compose up -d.

Fly.io / Modal / HF Spaces

CLI-based deploys with SQLite or bring-your-own Postgres.

Local (background)

omnigent server start or auto-started by omnigent run on your machine.

Runners (Hosts)

A runner is a Python subprocess that runs on the user’s machine — a laptop, a dev container, or a cloud sandbox. Runners are not deployed; each user launches one by running omnigent run, omnigent claude, or registering their machine with omnigent host. The runner dials into the server over WS /v1/runner/tunnel, authenticates, and waits for work. When a session receives a message, the server dispatches the task to the bound runner. The runner then:

Loads the agent spec and selects the harness.
Invokes the LLM loop locally (using the user’s own API keys or CLI login).
Executes tools in the local environment.
Streams events back through the WebSocket tunnel to the server, which fans them out to all connected clients (web UI, CLI REPL, SDK streams).

This design means that even when a session is shared with teammates or accessed from a phone, the agent’s code runs on the machine that registered as a host, with that machine’s credentials and filesystem access.

Cloud Sandbox Hosts

If you don’t want a laptop to stay online, runners can be launched in Modal or Daytona cloud sandboxes:

omnigent sandbox create --provider modal
omnigent sandbox connect --provider modal --sandbox-id <id> --server https://your-host

The server can also provision sandboxes automatically per session (managed hosts) by setting a sandbox: block in the server config.

Harnesses

A harness is an adapter that connects the runner to a specific agent runtime or SDK. The runner loads the harness declared in the agent’s executor.harness field (or the --harness CLI flag) and delegates all LLM interaction to it. The six supported harnesses are:

Harness	Runtime
`claude-sdk`	Claude Code via the `claude-agent-sdk` Python package
`openai-agents`	OpenAI Agents SDK
`codex`	OpenAI Codex CLI via `@openai/codex` npm package
`pi`	Pi harness (Anthropic Pi)
`claude-native`	Native Claude Code CLI, tmux-based
`codex-native`	Native Codex CLI, tmux-based

See the Harnesses page for credential requirements and gateway configuration per harness.

Session Lifecycle

A session is the live context for one agent conversation. Here is how a typical interactive session flows from start to finish:

omnigent claude
      │
      ▼
CLI ensures backend
  • host daemon starts (or reuses existing)
  • daemon spawns local server if none running
  • daemon connects runner → server via WS tunnel
      │
      ▼
CLI creates session on server
  POST /v1/sessions  →  session_id: conv_abc123
      │
      ▼
Server dispatches to runner
  runner loads claude-sdk harness
  harness starts LLM loop locally
      │
      ▼
User types a message
  CLI → POST /v1/sessions/{id}/events
  Server → forwards to bound runner
  Runner → calls Anthropic API (local creds)
  LLM response → streamed back via WS tunnel
  Server → fans out via SSE to all clients
      │
      ▼
Session persists in server DB
  (resumable, shareable, forkable)

Events emitted by the harness — text deltas, tool calls, tool results — are streamed back to the server in real time and forwarded to any connected clients: the terminal REPL, the web UI, SDK stream() callers, and teammates watching a shared session.

Local vs. Server Mode

Local mode (default)
Deployed server mode

Running omnigent claude or omnigent run with no --server flag starts everything on your machine in one step. A background daemon:

Auto-starts a local Omnigent server on http://localhost:6767.
Connects a runner to that server via the WebSocket tunnel.

The web UI at http://localhost:6767 shows the same session. Teammates on your LAN can open your machine’s LAN address (e.g. http://192.168.x.x:6767) to watch or co-drive.

omnigent claude            # server + runner start automatically
omnigent server status     # check what's running
omnigent stop              # stop everything

Once the server is running at a stable URL, your laptop registers as a host and dials in. The server lives in the cloud; the runner and credentials stay on your machine.

omnigent login https://your-host    # authenticate once
omnigent host https://your-host     # register this machine as a host

# Point a one-off run at the remote server
omnigent run path/to/agent.yaml --server https://your-host

New sessions created in the web UI will run on registered host machines. When no laptop needs to stay online, use managed hosts (Modal or Daytona sandboxes provisioned per session by the server).

Get Started

Core Concepts

Guides

Deployment

Reference

Omnigent Architecture: Server, Runners, and Harnesses

The Server

Render / Railway

Docker Compose

Fly.io / Modal / HF Spaces

Local (background)

Runners (Hosts)

Cloud Sandbox Hosts

Harnesses

Session Lifecycle

Local vs. Server Mode

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Deployment

Reference

Documentation Index

​The Server

Render / Railway

Docker Compose

Fly.io / Modal / HF Spaces

Local (background)

​Runners (Hosts)

​Cloud Sandbox Hosts

​Harnesses

​Session Lifecycle

​Local vs. Server Mode

Build docs developers (and LLMs) love

The Server

Runners (Hosts)

Cloud Sandbox Hosts

Harnesses

Session Lifecycle

Local vs. Server Mode