Step 6: baseline browser-task generalist#9
Merged
rockfordlhotka merged 1 commit intomainfrom Apr 23, 2026
Merged
Conversation
First v0.2 capability. LLM-in-the-loop planner over ref-annotated
aria snapshots + aria-ref=eN locator resolution, built on
Microsoft.Playwright 1.59 (bumped from 1.50 for AriaSnapshotMode.Ai).
Registered tiered chat clients via AddRockBotTieredChatClients with
one model aliased across Low/Balanced/High per spec §3.7.
- HostAllowlist with *, *.domain, exact-host patterns; empty rejects.
- Context-wide RouteAsync aborts off-list document/subframe navs
before Playwright issues the request (spec §7.1).
- [AIFunction] tools: snapshot / navigate / click / type / wait_for /
done / fail. Budget enforced tool-side (default maxSteps=60, ceiling
150) + wall-clock CancellationTokenSource (default 120s, ceiling 600).
- Structured JSON output: {status, summary, result, steps, navigations}.
- Unit tests: 14 new (HostAllowlist, BrowserTaskCapability via
ScriptedChatClient + FakeBrowserAgentPage).
- Real-LLM benchmark: 3 Kestrel scenarios, all pass on first attempt
against Azure AI Foundry Balanced (5/8/7s). Establishes the unaided
floor before step 7 adds priming.
- Smoke-tested via docker-compose HTTP gateway (agent-card lists four
skills; browser-task returns "Example Domain" in 1 step; off-list
URLs rejected).
Framework observations captured in docs/framework-feedback.md:
AddRockBotTieredChatClients subsumes AddRockBotChatClient
(undocumented); no per-request iteration cap on the function-invoking
chat client; Playwright aria-ref gating via AriaSnapshotMode.Ai rather
than a boolean Ref option.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First v0.2 milestone (spec §9.1 step 6). Adds
browser-task, the LLM-in-the-loop generalist capability built natively onMicrosoft.Playwright(no MCP sidecar, no Stagehand port — Appendix A #16). Uses ref-annotated aria snapshots +aria-ref=eNlocator resolution.src/Foragent.Capabilities/BrowserTask/with 7[AIFunction]tools:snapshot,navigate,click,type,wait_for,done,fail. Budget enforced tool-side (defaultmaxSteps=60, ceiling 150) + wall-clockCancellationTokenSource(default 120s, ceiling 600). Output is structured JSON{status, summary, result, steps, navigations}.Foragent.Browserextended withIBrowserAgentPagefor ref-based interactions, plus an allowlist-scopedCreateSessionAsync(Func<Uri,bool>, ...)overload that aborts off-list document/subframe navigations inside the context'sRouteAsync— before Playwright issues the request (spec §7.1).HostAllowlistparsesbsky.app,*.example.com, and*patterns; empty lists reject.Program.csnow wiresAddRockBotTieredChatClients(one model aliased across Low/Balanced/High per spec §3.7, Appendix #17);AddRockBotChatClientcall removed (the tiered registration already wraps withRockBotFunctionInvokingChatClient).Microsoft.Playwrightbumped 1.50 → 1.59 forAriaSnapshotMode.Ai(ref-annotated snapshots). Matching base-image bump deferred to release.browser-taskadded toForagentCapabilities.Skillsfirst; existing three specialists remain (spec removespost-to-siteat step 7, not 6).deploy/rockbot-seed/updated.Verification
dotnet build -c Releaseclean;dotnet test -c Release→ 65 pass, 4 skipped (skipped are real-LLM integration benchmarks whenFORAGENT_LLM_*is unset).Real-LLM benchmark (step 6 floor measurement, spec §9.1): 3 / 3 scenarios pass on first attempt against Azure AI Foundry Balanced.
HTTP gateway smoke test via
docker compose up:/.well-known/agent-card.jsonadvertises all four skills.POST /withskill: browser-task/url: https://example.com/allowedHosts: ["example.com"]returns{"status":"done","result":"Example Domain","steps":1}.https://evil.example/) is rejected:"Starting URL host 'evil.example' is not in the allowlist."Framework observations surfaced (not worked around — see docs/framework-feedback.md)
AddRockBotTieredChatClientssubsumesAddRockBotChatClientbut this is undocumented; calling both can silently swap the wrapped registration.MaximumIterationsPerRequestpath throughChatOptions— Foragent enforces step budget tool-side viaBrowserTaskState.BudgetExhausted.Microsoft.PlaywrightC# bindings gate ref-annotated aria snapshots behindMode = AriaSnapshotMode.Airather than a booleanRefoption.Test plan
dotnet build -c Releasedotnet test -c Releasedocker compose up --build; curl smoke test via HTTP gateway🤖 Generated with Claude Code