Skip to content

Step 6: baseline browser-task generalist#9

Merged
rockfordlhotka merged 1 commit intomainfrom
step-6-browser-task
Apr 23, 2026
Merged

Step 6: baseline browser-task generalist#9
rockfordlhotka merged 1 commit intomainfrom
step-6-browser-task

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

Summary

First v0.2 milestone (spec §9.1 step 6). Adds browser-task, the LLM-in-the-loop generalist capability built natively on Microsoft.Playwright (no MCP sidecar, no Stagehand port — Appendix A #16). Uses ref-annotated aria snapshots + aria-ref=eN locator resolution.

  • New capability src/Foragent.Capabilities/BrowserTask/ with 7 [AIFunction] tools: snapshot, navigate, click, type, wait_for, done, fail. Budget enforced tool-side (default maxSteps=60, ceiling 150) + wall-clock CancellationTokenSource (default 120s, ceiling 600). Output is structured JSON {status, summary, result, steps, navigations}.
  • Foragent.Browser extended with IBrowserAgentPage for ref-based interactions, plus an allowlist-scoped CreateSessionAsync(Func<Uri,bool>, ...) overload that aborts off-list document/subframe navigations inside the context's RouteAsync — before Playwright issues the request (spec §7.1).
  • HostAllowlist parses bsky.app, *.example.com, and * patterns; empty lists reject.
  • Program.cs now wires AddRockBotTieredChatClients (one model aliased across Low/Balanced/High per spec §3.7, Appendix #17); AddRockBotChatClient call removed (the tiered registration already wraps with RockBotFunctionInvokingChatClient).
  • Microsoft.Playwright bumped 1.50 → 1.59 for AriaSnapshotMode.Ai (ref-annotated snapshots). Matching base-image bump deferred to release.
  • browser-task added to ForagentCapabilities.Skills first; existing three specialists remain (spec removes post-to-site at step 7, not 6). deploy/rockbot-seed/ updated.

Verification

  • dotnet build -c Release clean; dotnet test -c Release65 pass, 4 skipped (skipped are real-LLM integration benchmarks when FORAGENT_LLM_* is unset).

  • Real-LLM benchmark (step 6 floor measurement, spec §9.1): 3 / 3 scenarios pass on first attempt against Azure AI Foundry Balanced.

    Scenario Result Wall-clock
    Click-through ✅ done 5 s
    Form submit ✅ done 8 s
    Multi-page nav ✅ done 7 s
  • HTTP gateway smoke test via docker compose up:

    • /.well-known/agent-card.json advertises all four skills.
    • POST / with skill: browser-task / url: https://example.com / allowedHosts: ["example.com"] returns {"status":"done","result":"Example Domain","steps":1}.
    • Off-allowlist URL (https://evil.example/) is rejected: "Starting URL host 'evil.example' is not in the allowlist."

Framework observations surfaced (not worked around — see docs/framework-feedback.md)

  1. AddRockBotTieredChatClients subsumes AddRockBotChatClient but this is undocumented; calling both can silently swap the wrapped registration.
  2. No per-request MaximumIterationsPerRequest path through ChatOptions — Foragent enforces step budget tool-side via BrowserTaskState.BudgetExhausted.
  3. Microsoft.Playwright C# bindings gate ref-annotated aria snapshots behind Mode = AriaSnapshotMode.Ai rather than a boolean Ref option.

Test plan

  • dotnet build -c Release
  • dotnet test -c Release
  • Real-LLM benchmark scenarios (3 / 3 passing, floor established)
  • docker compose up --build; curl smoke test via HTTP gateway
  • Allowlist rejection verified end-to-end

🤖 Generated with Claude Code

First v0.2 capability. LLM-in-the-loop planner over ref-annotated
aria snapshots + aria-ref=eN locator resolution, built on
Microsoft.Playwright 1.59 (bumped from 1.50 for AriaSnapshotMode.Ai).
Registered tiered chat clients via AddRockBotTieredChatClients with
one model aliased across Low/Balanced/High per spec §3.7.

- HostAllowlist with *, *.domain, exact-host patterns; empty rejects.
- Context-wide RouteAsync aborts off-list document/subframe navs
  before Playwright issues the request (spec §7.1).
- [AIFunction] tools: snapshot / navigate / click / type / wait_for /
  done / fail. Budget enforced tool-side (default maxSteps=60, ceiling
  150) + wall-clock CancellationTokenSource (default 120s, ceiling 600).
- Structured JSON output: {status, summary, result, steps, navigations}.
- Unit tests: 14 new (HostAllowlist, BrowserTaskCapability via
  ScriptedChatClient + FakeBrowserAgentPage).
- Real-LLM benchmark: 3 Kestrel scenarios, all pass on first attempt
  against Azure AI Foundry Balanced (5/8/7s). Establishes the unaided
  floor before step 7 adds priming.
- Smoke-tested via docker-compose HTTP gateway (agent-card lists four
  skills; browser-task returns "Example Domain" in 1 step; off-list
  URLs rejected).

Framework observations captured in docs/framework-feedback.md:
AddRockBotTieredChatClients subsumes AddRockBotChatClient
(undocumented); no per-request iteration cap on the function-invoking
chat client; Playwright aria-ref gating via AriaSnapshotMode.Ai rather
than a boolean Ref option.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 041a7c5 into main Apr 23, 2026
1 check passed
@rockfordlhotka rockfordlhotka deleted the step-6-browser-task branch April 23, 2026 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant