MarimerLLC · rockfordlhotka · Apr 23, 2026 · Apr 23, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 ## Status
 
-Foragent is at **milestone 5 shipped, v0.2 spec adopted, step 6 next**. Three capabilities are live (`fetch-page-title`, `extract-structured-data`, `post-to-site`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. The governing spec is now `docs/foragent-specification.md` **v0.2** — read it before making non-trivial changes. v0.2 pivots Foragent to an agentic model: one generalist `browser-task` capability (built natively on `Microsoft.Playwright` NuGet — no MCP sidecar, no Stagehand port — see Appendix A #16) plus narrow fast-path specialists, with RockBot's `ISkillStore` + `ILongTermMemory` as the learning substrate. The v0.1 proposal document is archived at `docs/archive/foragent-spec-v0.2-proposal.md`. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md` step 4. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.
+Foragent is at **milestone 6 shipped, step 7 next**. Four capabilities are live (`browser-task`, `fetch-page-title`, `extract-structured-data`, `post-to-site`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. Step 6 shipped the generalist `browser-task` planner (LLM-in-the-loop over ref-annotated aria snapshots + `aria-ref=eN` locator resolution, built on `Microsoft.Playwright` 1.59 — bumped from 1.50 for the Ai aria-snapshot mode; see Appendix A #16). Tiered chat clients are wired via `AddRockBotTieredChatClients` with one model aliased across Low/Balanced/High per spec §3.7. The governing spec is `docs/foragent-specification.md` **v0.2**. Step 7 wires `ISkillStore` + `ILongTermMemory` priming; `post-to-site` is removed from the advertised skill list once `browser-task` + the learned bsky skill cover it. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md`. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.
 
 ## Build / test
 
@@ -69,11 +69,13 @@ Key framework pieces Foragent uses today:
 - `RockBot.A2A.IAgentTaskHandler` — the single per-agent extension point. `ForagentTaskHandler` (in `Foragent.Capabilities`) implements this and dispatches on `request.Skill`.
 - `RockBot.A2A.Gateway.AddA2AHttpGateway` + `MapA2AHttpGateway` — the in-process HTTP surface. Published as NuGet in RockBot 0.8.4 (see `docs/framework-feedback.md`).
 
-Foragent requires an LLM (for `extract-structured-data` and future capabilities). The same `IChatClient` is registered both as a singleton (capabilities inject it directly) and via `AddRockBotChatClient` (satisfies the framework's mandatory registration). Config lives under `ForagentLlm` — separate from any rockbot-side `LLM` config so the two agents can point at different models. Program.cs fails fast at startup if `ForagentLlm:Endpoint`/`ModelId`/`ApiKey` are missing.
+Foragent requires an LLM. Config lives under `ForagentLlm` — separate from any rockbot-side `LLM` config so the two agents can point at different models. Program.cs fails fast at startup if `ForagentLlm:Endpoint`/`ModelId`/`ApiKey` are missing. Starting step 6 the single configured model is wired via `AddRockBotTieredChatClients(low, balanced, high)` aliased to the same inner `IChatClient`; that one call registers both `IChatClient` (wrapped with `RockBotFunctionInvokingChatClient` for automatic tool invocation) and `TieredChatClientRegistry` (per spec §3.7). Don't also call `AddRockBotChatClient` — it would swap out the wrapped registration. Capabilities that want to escalate/de-escalate per request can resolve `TieredChatClientRegistry` and call `GetClient(ModelTier.Low|Balanced|High)`; none do today.
 
 ## Browser
 
-`Foragent.Browser` wraps Playwright. `AddForagentBrowser()` in `Foragent.Agent/Program.cs` registers `PlaywrightBrowserHost` (`IHostedService` owning one shared Chromium per process) and `IBrowserSessionFactory` (hands out a fresh `IBrowserContext` per A2A task — isolation guarantee from spec §3.5). `IBrowserSession` exposes `FetchPageTitleAsync` / `CapturePageSnapshotAsync` for one-shot reads, plus `OpenPageAsync` → `IBrowserPage` (navigate / fill / click / wait / read) for multi-step flows like login + post. The snapshot uses Chromium's aria-snapshot (via `Locator.AriaSnapshotAsync`) and falls back to `<body>` inner text when the tree is empty. Selectors passed to `IBrowserPage` use Playwright's string-selector dialect (CSS + `role=role[name="..."]`); **regex is not accepted in string form**, use exact attribute matches. `Foragent.Browser` has `InternalsVisibleTo("Foragent.Browser.Tests")` so tests drive the real `PlaywrightBrowserSessionFactory` without promoting its implementation types to public.
+`Foragent.Browser` wraps Playwright. `AddForagentBrowser()` in `Foragent.Agent/Program.cs` registers `PlaywrightBrowserHost` (`IHostedService` owning one shared Chromium per process) and `IBrowserSessionFactory` (hands out a fresh `IBrowserContext` per A2A task — isolation guarantee from spec §3.5). `IBrowserSession` exposes `FetchPageTitleAsync` / `CapturePageSnapshotAsync` for one-shot reads, `OpenPageAsync` → `IBrowserPage` (navigate / fill / click / wait / read) for multi-step flows like login + post, and `OpenAgentPageAsync` → `IBrowserAgentPage` for LLM-in-the-loop planners (ref-annotated aria snapshots + `aria-ref=eN` locator resolution). The snapshot uses Chromium's aria-snapshot (via `Locator.AriaSnapshotAsync`; `Mode = AriaSnapshotMode.Ai` gets the ref-annotated form) and falls back to `<body>` inner text when the tree is empty. Selectors passed to `IBrowserPage` use Playwright's string-selector dialect (CSS + `role=role[name="..."]`); **regex is not accepted in string form**, use exact attribute matches. `Foragent.Browser` has `InternalsVisibleTo("Foragent.Browser.Tests")` so tests drive the real `PlaywrightBrowserSessionFactory` without promoting its implementation types to public.
+
+`CreateSessionAsync(Func<Uri,bool> allowedHost, ...)` is the step-6 entry point for allowlist-scoped sessions. The factory installs a context-wide `RouteAsync("**/*", ...)` that aborts off-list document/subframe navigations before Playwright issues the request (spec §7.1). The no-argument overload accepts any host and stays available for specialists that enforce narrower rules elsewhere (e.g. `post-to-site` where the site id selects the host).
 
 ## Capabilities
 
@@ -84,6 +86,7 @@ Foragent requires an LLM (for `extract-structured-data` and future capabilities)
 - `ForagentCapabilities.Skills` (static array) is the single source of truth for advertised skills — both the bus-side `AgentCard.Skills` and the HTTP gateway's `opts.Skills` read from it.
 - `CapabilityInput.Parse` is the shared URL + description shim used by `fetch-page-title` and `extract-structured-data`. Capabilities with different input shapes (e.g. `post-to-site` needing `site` / `credentialId` / `content`) parse their own input near the capability — see `PostToSiteInput` in `PostToSiteCapability.cs`. Don't overload `CapabilityInput` for unrelated shapes.
 - `post-to-site` dispatches to an `ISitePoster` keyed on `Site` (in `SitePosting/`). `BlueskySitePoster` is the only implementation today; add new sites by registering another `ISitePoster` in `AddForagentCapabilities()`. The capability never echoes exception messages from posters back to callers — they may contain credential material; operators read the full exception in logs.
+- `browser-task` (in `BrowserTask/`) is the generalist planner (spec §5.2). `BrowserTaskInput` parses intent + mandatory `allowedHosts` + optional `url` / `credentialId` / `maxSteps` (default 60, ceiling 150) / `maxSeconds` (default 120, ceiling 600). `BrowserTaskTools` wraps `snapshot` / `navigate` / `click` / `type` / `wait_for` / `done` / `fail` as `AIFunction`s via `AIFunctionFactory.Create` and passes them in `ChatOptions.Tools`; the RockBot-wrapped function-invoking `IChatClient` runs the full model ↔ tool loop inside one `GetResponseAsync` call. Budget is enforced tool-side (each tool checks `BrowserTaskState.BudgetExhausted`) because Microsoft.Extensions.AI does not surface per-request iteration caps through `ChatOptions`; wall-clock is a linked `CancellationTokenSource`. **Never log tool arguments verbatim** — `type` carries user-supplied values that may be sensitive (log length only). Refs from a snapshot are valid only until the next mutating call; the system prompt and tool descriptions both state this, but don't code anything that assumes cross-snapshot ref stability.
 
 ## Credentials
 

diff --git a/Directory.Packages.props b/Directory.Packages.props
@@ -4,7 +4,7 @@
     <CentralPackageFloatingVersionsEnabled>true</CentralPackageFloatingVersionsEnabled>
   </PropertyGroup>
   <ItemGroup>
-    <PackageVersion Include="Microsoft.Playwright" Version="1.50.0" />
+    <PackageVersion Include="Microsoft.Playwright" Version="1.59.0" />
     <PackageVersion Include="Microsoft.Extensions.AI" Version="10.*" />
     <PackageVersion Include="Microsoft.Extensions.Configuration.Abstractions" Version="10.0.*" />
     <PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="10.0.*" />

diff --git a/deploy/rockbot-seed/agent-trust.json b/deploy/rockbot-seed/agent-trust.json
@@ -2,7 +2,7 @@
   {
     "agentId": "Foragent",
     "level": 4,
-    "approvedSkills": ["fetch-page-title", "extract-structured-data", "post-to-site"],
+    "approvedSkills": ["browser-task", "fetch-page-title", "extract-structured-data", "post-to-site"],
     "firstSeen": "2026-04-21T00:00:00+00:00",
     "lastInteraction": "2026-04-21T00:00:00+00:00",
     "interactionCount": 0

diff --git a/deploy/rockbot-seed/well-known-agents.json b/deploy/rockbot-seed/well-known-agents.json
@@ -8,6 +8,11 @@
     "authHeaderName": "X-Api-Key",
     "authHeaderValueBase64": "cm9ja2JvdC1jYWxscy1mb3JhZ2VudA==",
     "skills": [
+      {
+        "id": "browser-task",
+        "name": "Browser Task (generalist)",
+        "description": "Drive a browser with an LLM-in-the-loop planner to accomplish a free-form intent. Input JSON {\"intent\":\"...\",\"allowedHosts\":[\"host\",\"*.host\",\"*\"],\"url\":\"optional start\",\"credentialId\":\"optional\",\"maxSteps\":60,\"maxSeconds\":120}. allowedHosts is required and empty rejects. Returns a structured JSON result with status (done/failed/incomplete), summary, optional result, step count, and navigations."
+      },
       {
         "id": "fetch-page-title",
         "name": "Fetch Page Title",

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -61,7 +61,7 @@ services:
       RabbitMq__VirtualHost: /
       Gateway__AgentName: Foragent
       Gateway__InternalAgentName: Foragent
-      Gateway__Description: "Browser agent — fetch-page-title, extract-structured-data, post-to-site"
+      Gateway__Description: "Browser agent — browser-task (generalist), fetch-page-title, extract-structured-data, post-to-site"
       # RockBot will call Foragent with header X-Api-Key: rockbot-calls-foragent
       ApiKeys__rockbot-calls-foragent__AgentId: RockBot
       ApiKeys__rockbot-calls-foragent__DisplayName: RockBot

diff --git a/docs/capabilities.md b/docs/capabilities.md
@@ -3,18 +3,84 @@
 Foragent exposes browser operations as discrete A2A capabilities. Callers
 invoke capabilities by name; Foragent handles the browser mechanics.
 
-## Planned initial capability set
+## Advertised capabilities (v0.2)
 
-- [ ] `fetch-page-content` — Navigate to a URL and return the page content
-- [ ] `extract-structured-data` — Extract structured data from a page using
-  an LLM-assisted schema
-- [ ] `fill-form` — Fill and optionally submit an HTML form
-- [ ] `post-to-site` — Perform a multi-step posting action on a target site
-- [ ] `monitor-page` — Poll a page for a condition and notify when met
+- `browser-task` — **generalist**, spec §5.2. LLM-in-the-loop planner that
+  drives a real browser to accomplish a free-form intent. Shipped in
+  step 6.
+- `fetch-page-title` — specialist. Inherited from step 1/2.
+- `extract-structured-data` — specialist. Inherited from step 3.
+- `post-to-site` — specialist, credential-using. Inherited from step 4.
+  Scheduled for removal from the advertised list once step 7 lands
+  (`browser-task` + learned bsky skill subsume it).
+
+## `browser-task` input shape
+
+JSON in the first text part, or field-by-field metadata:
+
+```json
+{
+  "intent": "free-form description of what to accomplish",
+  "allowedHosts": ["bsky.app", "*.example.com", "*"],
+  "url": "optional absolute http(s) starting URL",
+  "credentialId": "optional broker reference",
+  "maxSteps": 60,
+  "maxSeconds": 120
+}
+```
+
+- `intent` — required. Free-form.
+- `allowedHosts` — required, non-empty (spec §7.1). An empty list rejects.
+  Supports exact hosts, `*.domain` subdomain wildcards, and `*` for
+  unrestricted. Off-list navigations are aborted inside the browser
+  context before Playwright issues the request.
+- `url` — optional. If provided, must match the allowlist.
+- `credentialId` — optional. Resolved but not exposed to the planner in
+  step 6; reserved for a typed login tool in a later step.
+- `maxSteps` — default 60, ceiling 150. Enforced tool-side via
+  `BrowserTaskState.BudgetExhausted`; once exceeded, tools return a
+  "call done or fail" message and refuse further work.
+- `maxSeconds` — default 120, ceiling 600. Enforced via a linked
+  `CancellationTokenSource`.
+
+## `browser-task` output shape
+
+A JSON object in a single text part:
+
+```json
+{
+  "status": "done" | "failed" | "incomplete",
+  "summary": "one-sentence human-readable result",
+  "result": "optional structured result text (e.g. extracted value)",
+  "steps": 7,
+  "navigations": ["https://host/path", "..."]
+}
+```
+
+`incomplete` means the budget was exhausted before `done`/`fail` was
+called.
+
+## `browser-task` tool surface
+
+Exposed to the planner via `[AIFunction]` wrappers over `IChatClient`
+(spec Appendix A #16 — no MCP sidecar). Refs are Playwright aria-ref ids
+and are valid only within the snapshot they came from.
+
+- `snapshot()` — ref-annotated aria tree of the current page.
+- `navigate(url)` — load a URL; host must be on the allowlist.
+- `click(ref)` — click by ref.
+- `type(ref, text)` — fill by ref.
+- `wait_for(ref, timeoutSeconds?)` — wait for visibility.
+- `done(summary, result?)` — mark complete.
+- `fail(reason)` — mark failed.
 
 ## Design principles
 
-- Capabilities operate at the task level, not at the DOM-operation level
-- Each capability invocation gets an isolated browser context
+- Capabilities operate at the task level, not at the DOM-operation level.
+- Each capability invocation gets an isolated `BrowserContext` (spec §3.5).
+- Per-task host allowlists are mandatory (spec §7.1).
 - Credential references are passed by ID; values are resolved inside
-  Foragent and never cross A2A boundaries
+  Foragent and never cross A2A boundaries (spec §6.1).
+- Prohibited capabilities — account creation, financial transactions,
+  modifying security permissions — are out of scope regardless of
+  implementation ease (spec §7.3).
diff --git a/docs/framework-feedback.md b/docs/framework-feedback.md
@@ -254,3 +254,88 @@ in `.env`.
   discovery) and `GatewayOptions.Skills` (HTTP agent-card endpoint) are independent. Our
   Program.cs populates both from a single `ForagentCapabilities.Skills` array — a workaround,
   not a fix. The framework should treat one as authoritative and derive the other.
+
+## Step 6 — baseline `browser-task` generalist
+
+### Framework observations
+
+- **`AddRockBotTieredChatClients` obviates `AddRockBotChatClient` but this
+  is undocumented.** Calling `AddRockBotTieredChatClients(low, balanced,
+  high)` registers an `IChatClient` singleton whose factory already wraps
+  the inner client with `RockBotFunctionInvokingChatClient`, plus a
+  `TieredChatClientRegistry` singleton. Callers who previously used
+  `AddRockBotChatClient(client)` don't need to call both — but that's
+  not spelled out anywhere. If both are called, the second registration
+  silently wins (standard MEDI behavior), which can swap the wrapped
+  client for an unwrapped one depending on order. Docs gap; candidate
+  framework fix is either a guard throw or collapsing both methods into
+  one overload shape.
+
+- **No per-request iteration cap surface on the function-invoking chat
+  client.** `FunctionInvokingChatClient.MaximumIterationsPerRequest` is
+  an *instance* property, and the wrapped client is built inside
+  `AddRockBotTieredChatClients` — the caller has no hook to set it per
+  `GetResponseAsync` invocation. `ChatOptions.AdditionalProperties`
+  lookup keys are not honored. `ModelBehavior.MaxToolIterationsOverride`
+  exists on the RockBot side but routes through YAML behavior config,
+  not per-call. Foragent enforces its step budget tool-side (each tool
+  checks `BrowserTaskState.BudgetExhausted`); wall-clock cancellation
+  is the real safety net. Framework candidate: either honor a standard
+  `ChatOptions.AdditionalProperties["MaximumIterationsPerRequest"]`
+  convention or expose the FICC instance via DI so consumers can
+  configure it.
+
+- **`Microsoft.Playwright` 1.50 (pinned since step 2) does not expose
+  the Ai aria-snapshot mode.** Step 6 requires ref-annotated snapshots
+  (`[ref=eN]` + `aria-ref=eN` locator resolution). That gating moved
+  from a boolean `Ref` option to `Mode = AriaSnapshotMode.Ai` sometime
+  between 1.52 and the current 1.59 C# bindings. Foragent bumped the
+  pin to 1.59.0; container base image
+  (`mcr.microsoft.com/playwright/dotnet:v1.50.0-noble`) will need the
+  matching bump in the first release that ships browser-task. Not a
+  framework-issue per se, but relevant to RockBot's "v1 Foragent" story
+  and to anyone using the framework + Playwright together.
+
+- **Aria-ref lifetime is a contract the planner must respect.** Refs are
+  valid only within the snapshot they came from. The tool surface
+  documents this in the `snapshot` description; if the framework ever
+  ships a "browser task runner" helper of its own (candidate
+  `RockBot.Browser.Planner`?), it should bake the "re-snapshot after
+  mutation" rule into a first-class contract rather than leaving it to
+  prompt text.
+
+- **`AIFunctionFactory.Create(Delegate, name:, description:, …)`
+  descriptions only surface the method-level `[Description]`.** Parameter
+  descriptions must be on parameters via `[Description]` — easy to miss
+  without the reminder. Worked as expected; noting for anyone building
+  similar tool surfaces.
+
+- **RockBot's `RockBotFunctionInvokingChatClient` auto-invokes tools end
+  to end in a single `GetResponseAsync` call.** This is exactly what the
+  planner wants; no custom loop needed. One quirk: the FICC keeps
+  iterating as long as the model emits tool calls, with no public
+  step cap (see above). Combined with aria-ref lifetimes, a model that
+  thrashes on stale refs can burn budget fast. Step 7's learning
+  substrate is the intended mitigation.
+
+### Unaided floor measurement (2026-04-22)
+
+First end-to-end benchmark against the operator's Azure AI Foundry
+Balanced model (no learned skills, no priming — the "unaided" floor the
+spec §9.1 step 6 calls for):
+
+| Scenario | Result | Wall-clock |
+|---|---|---|
+| Click-through (home → link → read destination value) | ✅ done | 5 s |
+| Form submit (fill name + textarea → submit → read confirmation) | ✅ done | 8 s |
+| Multi-page nav (index → intro → chapter-2 → read bolded answer) | ✅ done | 7 s |
+
+3 / 3 passed on first attempt. Establishes the baseline Foragent must
+not regress against once step 7 adds priming. Re-run this set whenever
+the planner prompt, tool surface, or model pin changes.
+
+### Not yet exercised
+
+- **`TieredChatClientRegistry.GetClient(ModelTier.Low/High)` is wired
+  but no capability resolves it yet.** All three tiers currently alias
+  to the same model. Tier-aware capability code lands as models diverge.