Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Status

Foragent is at **milestone 5 shipped, v0.2 spec adopted, step 6 next**. Three capabilities are live (`fetch-page-title`, `extract-structured-data`, `post-to-site`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. The governing spec is now `docs/foragent-specification.md` **v0.2** — read it before making non-trivial changes. v0.2 pivots Foragent to an agentic model: one generalist `browser-task` capability (built natively on `Microsoft.Playwright` NuGet — no MCP sidecar, no Stagehand port — see Appendix A #16) plus narrow fast-path specialists, with RockBot's `ISkillStore` + `ILongTermMemory` as the learning substrate. The v0.1 proposal document is archived at `docs/archive/foragent-spec-v0.2-proposal.md`. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md` step 4. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.
Foragent is at **milestone 6 shipped, step 7 next**. Four capabilities are live (`browser-task`, `fetch-page-title`, `extract-structured-data`, `post-to-site`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. Step 6 shipped the generalist `browser-task` planner (LLM-in-the-loop over ref-annotated aria snapshots + `aria-ref=eN` locator resolution, built on `Microsoft.Playwright` 1.59 — bumped from 1.50 for the Ai aria-snapshot mode; see Appendix A #16). Tiered chat clients are wired via `AddRockBotTieredChatClients` with one model aliased across Low/Balanced/High per spec §3.7. The governing spec is `docs/foragent-specification.md` **v0.2**. Step 7 wires `ISkillStore` + `ILongTermMemory` priming; `post-to-site` is removed from the advertised skill list once `browser-task` + the learned bsky skill cover it. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md`. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.

## Build / test

Expand Down Expand Up @@ -69,11 +69,13 @@ Key framework pieces Foragent uses today:
- `RockBot.A2A.IAgentTaskHandler` — the single per-agent extension point. `ForagentTaskHandler` (in `Foragent.Capabilities`) implements this and dispatches on `request.Skill`.
- `RockBot.A2A.Gateway.AddA2AHttpGateway` + `MapA2AHttpGateway` — the in-process HTTP surface. Published as NuGet in RockBot 0.8.4 (see `docs/framework-feedback.md`).

Foragent requires an LLM (for `extract-structured-data` and future capabilities). The same `IChatClient` is registered both as a singleton (capabilities inject it directly) and via `AddRockBotChatClient` (satisfies the framework's mandatory registration). Config lives under `ForagentLlm` — separate from any rockbot-side `LLM` config so the two agents can point at different models. Program.cs fails fast at startup if `ForagentLlm:Endpoint`/`ModelId`/`ApiKey` are missing.
Foragent requires an LLM. Config lives under `ForagentLlm` — separate from any rockbot-side `LLM` config so the two agents can point at different models. Program.cs fails fast at startup if `ForagentLlm:Endpoint`/`ModelId`/`ApiKey` are missing. Starting step 6 the single configured model is wired via `AddRockBotTieredChatClients(low, balanced, high)` aliased to the same inner `IChatClient`; that one call registers both `IChatClient` (wrapped with `RockBotFunctionInvokingChatClient` for automatic tool invocation) and `TieredChatClientRegistry` (per spec §3.7). Don't also call `AddRockBotChatClient` — it would swap out the wrapped registration. Capabilities that want to escalate/de-escalate per request can resolve `TieredChatClientRegistry` and call `GetClient(ModelTier.Low|Balanced|High)`; none do today.

## Browser

`Foragent.Browser` wraps Playwright. `AddForagentBrowser()` in `Foragent.Agent/Program.cs` registers `PlaywrightBrowserHost` (`IHostedService` owning one shared Chromium per process) and `IBrowserSessionFactory` (hands out a fresh `IBrowserContext` per A2A task — isolation guarantee from spec §3.5). `IBrowserSession` exposes `FetchPageTitleAsync` / `CapturePageSnapshotAsync` for one-shot reads, plus `OpenPageAsync` → `IBrowserPage` (navigate / fill / click / wait / read) for multi-step flows like login + post. The snapshot uses Chromium's aria-snapshot (via `Locator.AriaSnapshotAsync`) and falls back to `<body>` inner text when the tree is empty. Selectors passed to `IBrowserPage` use Playwright's string-selector dialect (CSS + `role=role[name="..."]`); **regex is not accepted in string form**, use exact attribute matches. `Foragent.Browser` has `InternalsVisibleTo("Foragent.Browser.Tests")` so tests drive the real `PlaywrightBrowserSessionFactory` without promoting its implementation types to public.
`Foragent.Browser` wraps Playwright. `AddForagentBrowser()` in `Foragent.Agent/Program.cs` registers `PlaywrightBrowserHost` (`IHostedService` owning one shared Chromium per process) and `IBrowserSessionFactory` (hands out a fresh `IBrowserContext` per A2A task — isolation guarantee from spec §3.5). `IBrowserSession` exposes `FetchPageTitleAsync` / `CapturePageSnapshotAsync` for one-shot reads, `OpenPageAsync` → `IBrowserPage` (navigate / fill / click / wait / read) for multi-step flows like login + post, and `OpenAgentPageAsync` → `IBrowserAgentPage` for LLM-in-the-loop planners (ref-annotated aria snapshots + `aria-ref=eN` locator resolution). The snapshot uses Chromium's aria-snapshot (via `Locator.AriaSnapshotAsync`; `Mode = AriaSnapshotMode.Ai` gets the ref-annotated form) and falls back to `<body>` inner text when the tree is empty. Selectors passed to `IBrowserPage` use Playwright's string-selector dialect (CSS + `role=role[name="..."]`); **regex is not accepted in string form**, use exact attribute matches. `Foragent.Browser` has `InternalsVisibleTo("Foragent.Browser.Tests")` so tests drive the real `PlaywrightBrowserSessionFactory` without promoting its implementation types to public.

`CreateSessionAsync(Func<Uri,bool> allowedHost, ...)` is the step-6 entry point for allowlist-scoped sessions. The factory installs a context-wide `RouteAsync("**/*", ...)` that aborts off-list document/subframe navigations before Playwright issues the request (spec §7.1). The no-argument overload accepts any host and stays available for specialists that enforce narrower rules elsewhere (e.g. `post-to-site` where the site id selects the host).

## Capabilities

Expand All @@ -84,6 +86,7 @@ Foragent requires an LLM (for `extract-structured-data` and future capabilities)
- `ForagentCapabilities.Skills` (static array) is the single source of truth for advertised skills — both the bus-side `AgentCard.Skills` and the HTTP gateway's `opts.Skills` read from it.
- `CapabilityInput.Parse` is the shared URL + description shim used by `fetch-page-title` and `extract-structured-data`. Capabilities with different input shapes (e.g. `post-to-site` needing `site` / `credentialId` / `content`) parse their own input near the capability — see `PostToSiteInput` in `PostToSiteCapability.cs`. Don't overload `CapabilityInput` for unrelated shapes.
- `post-to-site` dispatches to an `ISitePoster` keyed on `Site` (in `SitePosting/`). `BlueskySitePoster` is the only implementation today; add new sites by registering another `ISitePoster` in `AddForagentCapabilities()`. The capability never echoes exception messages from posters back to callers — they may contain credential material; operators read the full exception in logs.
- `browser-task` (in `BrowserTask/`) is the generalist planner (spec §5.2). `BrowserTaskInput` parses intent + mandatory `allowedHosts` + optional `url` / `credentialId` / `maxSteps` (default 60, ceiling 150) / `maxSeconds` (default 120, ceiling 600). `BrowserTaskTools` wraps `snapshot` / `navigate` / `click` / `type` / `wait_for` / `done` / `fail` as `AIFunction`s via `AIFunctionFactory.Create` and passes them in `ChatOptions.Tools`; the RockBot-wrapped function-invoking `IChatClient` runs the full model ↔ tool loop inside one `GetResponseAsync` call. Budget is enforced tool-side (each tool checks `BrowserTaskState.BudgetExhausted`) because Microsoft.Extensions.AI does not surface per-request iteration caps through `ChatOptions`; wall-clock is a linked `CancellationTokenSource`. **Never log tool arguments verbatim** — `type` carries user-supplied values that may be sensitive (log length only). Refs from a snapshot are valid only until the next mutating call; the system prompt and tool descriptions both state this, but don't code anything that assumes cross-snapshot ref stability.

## Credentials

Expand Down
2 changes: 1 addition & 1 deletion Directory.Packages.props
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<CentralPackageFloatingVersionsEnabled>true</CentralPackageFloatingVersionsEnabled>
</PropertyGroup>
<ItemGroup>
<PackageVersion Include="Microsoft.Playwright" Version="1.50.0" />
<PackageVersion Include="Microsoft.Playwright" Version="1.59.0" />
<PackageVersion Include="Microsoft.Extensions.AI" Version="10.*" />
<PackageVersion Include="Microsoft.Extensions.Configuration.Abstractions" Version="10.0.*" />
<PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="10.0.*" />
Expand Down
2 changes: 1 addition & 1 deletion deploy/rockbot-seed/agent-trust.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
{
"agentId": "Foragent",
"level": 4,
"approvedSkills": ["fetch-page-title", "extract-structured-data", "post-to-site"],
"approvedSkills": ["browser-task", "fetch-page-title", "extract-structured-data", "post-to-site"],
"firstSeen": "2026-04-21T00:00:00+00:00",
"lastInteraction": "2026-04-21T00:00:00+00:00",
"interactionCount": 0
Expand Down
5 changes: 5 additions & 0 deletions deploy/rockbot-seed/well-known-agents.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
"authHeaderName": "X-Api-Key",
"authHeaderValueBase64": "cm9ja2JvdC1jYWxscy1mb3JhZ2VudA==",
"skills": [
{
"id": "browser-task",
"name": "Browser Task (generalist)",
"description": "Drive a browser with an LLM-in-the-loop planner to accomplish a free-form intent. Input JSON {\"intent\":\"...\",\"allowedHosts\":[\"host\",\"*.host\",\"*\"],\"url\":\"optional start\",\"credentialId\":\"optional\",\"maxSteps\":60,\"maxSeconds\":120}. allowedHosts is required and empty rejects. Returns a structured JSON result with status (done/failed/incomplete), summary, optional result, step count, and navigations."
},
{
"id": "fetch-page-title",
"name": "Fetch Page Title",
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ services:
RabbitMq__VirtualHost: /
Gateway__AgentName: Foragent
Gateway__InternalAgentName: Foragent
Gateway__Description: "Browser agent — fetch-page-title, extract-structured-data, post-to-site"
Gateway__Description: "Browser agent — browser-task (generalist), fetch-page-title, extract-structured-data, post-to-site"
# RockBot will call Foragent with header X-Api-Key: rockbot-calls-foragent
ApiKeys__rockbot-calls-foragent__AgentId: RockBot
ApiKeys__rockbot-calls-foragent__DisplayName: RockBot
Expand Down
86 changes: 76 additions & 10 deletions docs/capabilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,84 @@
Foragent exposes browser operations as discrete A2A capabilities. Callers
invoke capabilities by name; Foragent handles the browser mechanics.

## Planned initial capability set
## Advertised capabilities (v0.2)

- [ ] `fetch-page-content` — Navigate to a URL and return the page content
- [ ] `extract-structured-data` — Extract structured data from a page using
an LLM-assisted schema
- [ ] `fill-form` — Fill and optionally submit an HTML form
- [ ] `post-to-site` — Perform a multi-step posting action on a target site
- [ ] `monitor-page` — Poll a page for a condition and notify when met
- `browser-task` — **generalist**, spec §5.2. LLM-in-the-loop planner that
drives a real browser to accomplish a free-form intent. Shipped in
step 6.
- `fetch-page-title` — specialist. Inherited from step 1/2.
- `extract-structured-data` — specialist. Inherited from step 3.
- `post-to-site` — specialist, credential-using. Inherited from step 4.
Scheduled for removal from the advertised list once step 7 lands
(`browser-task` + learned bsky skill subsume it).

## `browser-task` input shape

JSON in the first text part, or field-by-field metadata:

```json
{
"intent": "free-form description of what to accomplish",
"allowedHosts": ["bsky.app", "*.example.com", "*"],
"url": "optional absolute http(s) starting URL",
"credentialId": "optional broker reference",
"maxSteps": 60,
"maxSeconds": 120
}
```

- `intent` — required. Free-form.
- `allowedHosts` — required, non-empty (spec §7.1). An empty list rejects.
Supports exact hosts, `*.domain` subdomain wildcards, and `*` for
unrestricted. Off-list navigations are aborted inside the browser
context before Playwright issues the request.
- `url` — optional. If provided, must match the allowlist.
- `credentialId` — optional. Resolved but not exposed to the planner in
step 6; reserved for a typed login tool in a later step.
- `maxSteps` — default 60, ceiling 150. Enforced tool-side via
`BrowserTaskState.BudgetExhausted`; once exceeded, tools return a
"call done or fail" message and refuse further work.
- `maxSeconds` — default 120, ceiling 600. Enforced via a linked
`CancellationTokenSource`.

## `browser-task` output shape

A JSON object in a single text part:

```json
{
"status": "done" | "failed" | "incomplete",
"summary": "one-sentence human-readable result",
"result": "optional structured result text (e.g. extracted value)",
"steps": 7,
"navigations": ["https://host/path", "..."]
}
```

`incomplete` means the budget was exhausted before `done`/`fail` was
called.

## `browser-task` tool surface

Exposed to the planner via `[AIFunction]` wrappers over `IChatClient`
(spec Appendix A #16 — no MCP sidecar). Refs are Playwright aria-ref ids
and are valid only within the snapshot they came from.

- `snapshot()` — ref-annotated aria tree of the current page.
- `navigate(url)` — load a URL; host must be on the allowlist.
- `click(ref)` — click by ref.
- `type(ref, text)` — fill by ref.
- `wait_for(ref, timeoutSeconds?)` — wait for visibility.
- `done(summary, result?)` — mark complete.
- `fail(reason)` — mark failed.

## Design principles

- Capabilities operate at the task level, not at the DOM-operation level
- Each capability invocation gets an isolated browser context
- Capabilities operate at the task level, not at the DOM-operation level.
- Each capability invocation gets an isolated `BrowserContext` (spec §3.5).
- Per-task host allowlists are mandatory (spec §7.1).
- Credential references are passed by ID; values are resolved inside
Foragent and never cross A2A boundaries
Foragent and never cross A2A boundaries (spec §6.1).
- Prohibited capabilities — account creation, financial transactions,
modifying security permissions — are out of scope regardless of
implementation ease (spec §7.3).
85 changes: 85 additions & 0 deletions docs/framework-feedback.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,3 +254,88 @@ in `.env`.
discovery) and `GatewayOptions.Skills` (HTTP agent-card endpoint) are independent. Our
Program.cs populates both from a single `ForagentCapabilities.Skills` array — a workaround,
not a fix. The framework should treat one as authoritative and derive the other.

## Step 6 — baseline `browser-task` generalist

### Framework observations

- **`AddRockBotTieredChatClients` obviates `AddRockBotChatClient` but this
is undocumented.** Calling `AddRockBotTieredChatClients(low, balanced,
high)` registers an `IChatClient` singleton whose factory already wraps
the inner client with `RockBotFunctionInvokingChatClient`, plus a
`TieredChatClientRegistry` singleton. Callers who previously used
`AddRockBotChatClient(client)` don't need to call both — but that's
not spelled out anywhere. If both are called, the second registration
silently wins (standard MEDI behavior), which can swap the wrapped
client for an unwrapped one depending on order. Docs gap; candidate
framework fix is either a guard throw or collapsing both methods into
one overload shape.

- **No per-request iteration cap surface on the function-invoking chat
client.** `FunctionInvokingChatClient.MaximumIterationsPerRequest` is
an *instance* property, and the wrapped client is built inside
`AddRockBotTieredChatClients` — the caller has no hook to set it per
`GetResponseAsync` invocation. `ChatOptions.AdditionalProperties`
lookup keys are not honored. `ModelBehavior.MaxToolIterationsOverride`
exists on the RockBot side but routes through YAML behavior config,
not per-call. Foragent enforces its step budget tool-side (each tool
checks `BrowserTaskState.BudgetExhausted`); wall-clock cancellation
is the real safety net. Framework candidate: either honor a standard
`ChatOptions.AdditionalProperties["MaximumIterationsPerRequest"]`
convention or expose the FICC instance via DI so consumers can
configure it.

- **`Microsoft.Playwright` 1.50 (pinned since step 2) does not expose
the Ai aria-snapshot mode.** Step 6 requires ref-annotated snapshots
(`[ref=eN]` + `aria-ref=eN` locator resolution). That gating moved
from a boolean `Ref` option to `Mode = AriaSnapshotMode.Ai` sometime
between 1.52 and the current 1.59 C# bindings. Foragent bumped the
pin to 1.59.0; container base image
(`mcr.microsoft.com/playwright/dotnet:v1.50.0-noble`) will need the
matching bump in the first release that ships browser-task. Not a
framework-issue per se, but relevant to RockBot's "v1 Foragent" story
and to anyone using the framework + Playwright together.

- **Aria-ref lifetime is a contract the planner must respect.** Refs are
valid only within the snapshot they came from. The tool surface
documents this in the `snapshot` description; if the framework ever
ships a "browser task runner" helper of its own (candidate
`RockBot.Browser.Planner`?), it should bake the "re-snapshot after
mutation" rule into a first-class contract rather than leaving it to
prompt text.

- **`AIFunctionFactory.Create(Delegate, name:, description:, …)`
descriptions only surface the method-level `[Description]`.** Parameter
descriptions must be on parameters via `[Description]` — easy to miss
without the reminder. Worked as expected; noting for anyone building
similar tool surfaces.

- **RockBot's `RockBotFunctionInvokingChatClient` auto-invokes tools end
to end in a single `GetResponseAsync` call.** This is exactly what the
planner wants; no custom loop needed. One quirk: the FICC keeps
iterating as long as the model emits tool calls, with no public
step cap (see above). Combined with aria-ref lifetimes, a model that
thrashes on stale refs can burn budget fast. Step 7's learning
substrate is the intended mitigation.

### Unaided floor measurement (2026-04-22)

First end-to-end benchmark against the operator's Azure AI Foundry
Balanced model (no learned skills, no priming — the "unaided" floor the
spec §9.1 step 6 calls for):

| Scenario | Result | Wall-clock |
|---|---|---|
| Click-through (home → link → read destination value) | ✅ done | 5 s |
| Form submit (fill name + textarea → submit → read confirmation) | ✅ done | 8 s |
| Multi-page nav (index → intro → chapter-2 → read bolded answer) | ✅ done | 7 s |

3 / 3 passed on first attempt. Establishes the baseline Foragent must
not regress against once step 7 adds priming. Re-run this set whenever
the planner prompt, tool surface, or model pin changes.

### Not yet exercised

- **`TieredChatClientRegistry.GetClient(ModelTier.Low/High)` is wired
but no capability resolves it yet.** All three tiers currently alias
to the same model. Tier-aware capability code lands as models diverge.
Loading
Loading