LLM-driven browser automation. Reads page state via accessibility tree, decides actions via LLM, executes in a loop until the goal is done.
96% pass rate on WebBench-50. Default model: gpt-5.4.
pnpm add @tangle-network/browser-agent-driver
pnpm add -D playwrightimport { chromium } from 'playwright'
import { PlaywrightDriver, AgentRunner } from '@tangle-network/browser-agent-driver'
const browser = await chromium.launch()
const page = await browser.newPage()
const driver = new PlaywrightDriver(page)
const runner = new AgentRunner({
driver,
config: { model: 'gpt-5.4' },
})
const result = await runner.run({
goal: 'Sign in and navigate to settings',
startUrl: 'https://app.example.com',
maxTurns: 30,
})
console.log(result.success, `${result.turns.length} turns`)
await browser.close()# single task
bad run --goal "Sign up for account" --url http://localhost:3000
# test suite from case file
bad run --cases ./cases.json
# authenticated session
bad run --goal "Open settings" --url https://app.example.com \
--storage-state ./.auth/session.json
# speed-optimized mode
bad run --cases ./cases.json --mode fast-explore
# evidence-rich mode for signoff
bad run --cases ./cases.json --mode full-evidenceCreate browser-agent-driver.config.ts in your project root:
import { defineConfig } from '@tangle-network/browser-agent-driver'
export default defineConfig({
model: 'gpt-5.4',
headless: true,
concurrency: 4,
maxTurns: 30,
timeoutMs: 300_000,
outputDir: './test-results',
reporters: ['junit', 'html'],
})Auto-detected by CLI and programmatic API. CLI flags override config values. Supports .ts, .js, .mjs.
import { TestRunner } from '@tangle-network/browser-agent-driver'
const suite = await runner.runSuite([
{
id: 'login',
name: 'User login flow',
startUrl: 'https://app.example.com/login',
goal: 'Log in with test credentials',
successCriteria: [
{ type: 'url-contains', value: '/dashboard' },
{ type: 'element-visible', selector: '[data-testid="user-menu"]' },
],
},
])The LLM can perform: click, type, press, hover, select, scroll, navigate, wait, evaluate, verifyPreview, complete, abort.
Each turn: observe page (a11y tree + optional screenshot) → LLM decides action → execute → verify effect → repeat.
Recovery is automatic: cookie consent, modal blockers, stuck loops (A-B-A-B oscillation), and selector failures are handled before the agent continues.
- Configuration Reference — all config options
- CLI Reference — commands, modes, profiles, auth
- Benchmarks & Experiments — tiered gates, AB specs, research cycles
- Wallet Automation — MetaMask, Rabby, extension flows
- Providers — OpenAI, Anthropic, Codex CLI, Claude Code, sandbox backend
- Reporters & Sinks — JUnit, HTML, webhooks, custom sinks
- Custom Drivers — implement the
Driverinterface
Ships Codex skills under skills/ for test execution discipline and agent-friendly UX conventions.
npm run skills:installTag-triggered via .github/workflows/publish-npm.yml. Push browser-agent-driver-vX.Y.Z to publish.
pnpm build # TypeScript → dist/
pnpm test # vitest
pnpm lint # type-check
pnpm check:boundariesDual-licensed under MIT and Apache 2.0. See LICENSE.