From e0648bc9ac5a64669c0d128ad77db4904b8a55a9 Mon Sep 17 00:00:00 2001
From: Rockford Lhotka <rocky@lhotka.net>
Date: Tue, 21 Apr 2026 22:08:59 -0500
Subject: [PATCH] Adopt spec v0.2: agentic generalist direction
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pivots Foragent from five narrow verbs to one generalist browser-task
capability plus a small set of fast-path specialists. Step 5 showed
hand-written site-specific code doesn't scale and that structured
typed skills are hostile to the natural-language callers (mostly other
LLM agents) Foragent actually has.

§5 wholesale rewrite: two-tier capability model (§5.1), v0.2 initial
set with browser-task as the generalist (§5.2), multi-phase flows
with returned artifacts (§5.5), learning substrate on RockBot's
ISkillStore + ILongTermMemory (§5.6), human-in-the-loop explicitly
caller-side (§5.7).

§3.7 adds LLM tier routing via RockBot's TieredChatClientRegistry.
§7.1 makes allowlists mandatory with wildcard support.
§9.1 adds steps 6-9; §9.2 drops the Stagehand exclusion.
§12 closes Q1/Q2; adds Q6/Q7/Q8 for the step 6-8 work.
Appendix A gains decisions #16-#20: direct-SDK (no MCP/Stagehand),
tier routing, mandatory wildcarded allowlists, framework persistence
for learned knowledge, multi-phase as separate tasks.

Working doc from the direction-setting discussion archived to
docs/archive/foragent-spec-v0.2-proposal.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md                                   |   2 +-
 docs/archive/foragent-spec-v0.2-proposal.md | 538 ++++++++++++++++++++
 docs/foragent-specification.md              | 391 ++++++++++----
 3 files changed, 839 insertions(+), 92 deletions(-)
 create mode 100644 docs/archive/foragent-spec-v0.2-proposal.md

diff --git a/CLAUDE.md b/CLAUDE.md
index 2b5d1c7..1159923 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 ## Status
 
-Foragent is at **milestone 5** (spec §9.1): the A2A surface is wired end-to-end against RockBot as the first real user via the `docker-compose.yml` harness, pinned to `rockylhotka/rockbot-agent:0.8.5`. Three capabilities are exercised — `fetch-page-title` (step 2, Playwright), `extract-structured-data` (step 3, Playwright + LLM), and `post-to-site` (step 4, Playwright + credential broker). Validation was scoped to "poster dispatches" — real Bluesky posting requires populating `FORAGENT_BLUESKY_*` in `.env` and is not yet covered by the milestone. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces are still deferred — tracked in `docs/framework-feedback.md` step 4. The authoritative design document is `docs/foragent-specification.md` — read it before making non-trivial changes. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.
+Foragent is at **milestone 5 shipped, v0.2 spec adopted, step 6 next**. Three capabilities are live (`fetch-page-title`, `extract-structured-data`, `post-to-site`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. The governing spec is now `docs/foragent-specification.md` **v0.2** — read it before making non-trivial changes. v0.2 pivots Foragent to an agentic model: one generalist `browser-task` capability (built natively on `Microsoft.Playwright` NuGet — no MCP sidecar, no Stagehand port — see Appendix A #16) plus narrow fast-path specialists, with RockBot's `ISkillStore` + `ILongTermMemory` as the learning substrate. The v0.1 proposal document is archived at `docs/archive/foragent-spec-v0.2-proposal.md`. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md` step 4. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.
 
 ## Build / test
 
diff --git a/docs/archive/foragent-spec-v0.2-proposal.md b/docs/archive/foragent-spec-v0.2-proposal.md
new file mode 100644
index 0000000..ed246d5
--- /dev/null
+++ b/docs/archive/foragent-spec-v0.2-proposal.md
@@ -0,0 +1,538 @@
+# Foragent Specification v0.2 — Proposal
+
+> **Status:** Proposed revision to `foragent-specification.md`. Not yet merged.
+> **Date:** April 2026
+> **Author:** Rocky Lhotka / session notes from step 5 retrospective
+
+This document proposes a direction change for Foragent after completing
+milestone 5. It captures the revised product vision, the implied spec
+edits, and a step-by-step implementation plan for milestones 6 through 9.
+
+The v0.1 spec is still the governing document. Once this proposal is
+reviewed and approved, the changes below are folded into
+`foragent-specification.md` and this file is archived.
+
+---
+
+## Part 1 — What's changing and why
+
+### What we learned in v0.1 (milestones 1–5)
+
+Milestones 1–5 shipped three narrow capabilities (`fetch-page-title`,
+`extract-structured-data`, `post-to-site`) and validated the RockBot
+framework boundary end-to-end. They also surfaced a design question the
+spec didn't anticipate: **how does Foragent scale to N websites without
+N site-specific capability implementations?**
+
+The step-4 `BlueskySitePoster` was a deliberate probe — ship one
+hand-written site poster, learn what it costs to add the second. The
+answer: it costs a full `ISitePoster` subclass, a CSS selector audit, a
+fake-server integration test, and a re-verification every time the site
+redesigns. That doesn't scale, and it isn't the product.
+
+Step 5 also exposed a second mismatch. RockBot's `invoke_agent` tool
+passes a single free-text `message` argument to the called agent. Its
+LLM tried to invoke `post-to-site` with `message="Create a post on
+Bluesky with the text: ..."` — not a structured `{site, credentialId,
+content}` object. Narrow typed skills are hostile to natural-language
+callers.
+
+### What v0.2 is
+
+Foragent becomes an **agentic browser agent**: given a free-form intent
+and a target URL, it plans and drives the browser to fulfill the intent,
+using internal LLM reasoning to resolve selectors, form structure, and
+retry strategy. Site-specific code is the exception, not the rule.
+
+This is what v0.1 §9.2 called "Stagehand-style natural-language-to-action
+layers" and flagged as "may be revisited later, v1 selector-resolution
+is sufficient." It wasn't. v0.2 revisits.
+
+### What v0.2 is *not*
+
+v0.2 is not a .NET port of Stagehand, and does not run Stagehand or
+`@playwright/mcp` as a Node sidecar. Direct integration was evaluated
+against the `Microsoft.Playwright` NuGet path already in use since
+milestone 2, and the NuGet path won on every relevant axis:
+
+- **Ref-annotated aria snapshots are a Playwright feature, not an MCP
+  feature.** `Page.AriaSnapshotAsync()` already emits stable `[ref=e42]`
+  markers, and `Page.Locator("aria-ref=e42")` resolves them. The LLM
+  picks refs the same way it would through MCP, with no process hop.
+- **Tool-schema wrapping is a trivial amount of C#.** Exposing
+  `snapshot`/`click`/`type`/`navigate`/`wait_for` as `[AIFunction]`
+  methods on an injected planner surface gives `IChatClient` the same
+  auto-discovered tool-calling experience MCP would, without the
+  JSON-RPC protocol overhead.
+- **Session state already lives in `Foragent.Browser`.** `IBrowserSession`
+  / `IBrowserPage` own the shared browser and per-task `BrowserContext`
+  per spec §3.5. Moving to MCP would rebuild that management on the
+  far side of a process boundary.
+- **Spec §6's credential boundary stays clean.** A Node sidecar handling
+  browser actions would also handle credential material (login flows,
+  form values). Keeping the inner layer in-process means credentials
+  never cross a process boundary — the §6.1 blast-radius guarantee
+  holds as written.
+- **Spec §3.4 Decision #1 survives unchanged.** The v0.1 "Playwright via
+  NuGet, not via MCP server container" decision was made for the same
+  reasons; v0.2's agentic model does not invalidate any of them.
+
+This closes v0.1 §12 open question #2 ("Stagehand-equivalent for .NET")
+as **build natively, not integrate or port.**
+
+### What stays
+
+- **A2A-native, RockBot-framework-hosted, self-hosted.** Unchanged.
+- **Credentials by reference via `ICredentialBroker`.** Unchanged; still
+  the design v0.1 got right.
+- **One shared browser, fresh `BrowserContext` per task.** Unchanged.
+- **Prohibited-capability list in §7.3** (no account creation, no
+  financial transactions, no security-permission changes). Unchanged;
+  arguably more load-bearing under the broader model.
+
+### What changes
+
+- **§5 Capability surface.** The initial five-verb list becomes a
+  two-tier model: one generalist capability plus a small set of narrow
+  fast-path specialists. `BlueskySitePoster` becomes a regression test,
+  not a template.
+- **New §5.5 Multi-phase flows.** First-class support for "learn then
+  execute" patterns with typed intermediate artifacts.
+- **New §5.6 Learning substrate.** Foragent uses RockBot framework's
+  `ISkillStore` + `ILongTermMemory` to persist learned site knowledge
+  and retrieve it on subsequent tasks.
+- **New §5.7 Human-in-the-loop.** Explicit statement that review gates
+  are the caller's responsibility; Foragent returns structured state.
+- **§7 Security.** Tighter: a generalist capability needs per-task
+  domain allowlist + intent policy enforcement, not just "refuse to
+  navigate off-allowlist."
+- **§9 Sequencing.** Steps 6–9 added.
+- **§9.2 Out of scope.** Stagehand-style exclusion removed.
+
+---
+
+## Part 2 — Proposed revised sections
+
+### §5 Capability surface (replacement for current §5.1–§5.4)
+
+#### §5.1 Capability model
+
+Foragent exposes capabilities at two tiers:
+
+1. **Generalist.** One capability (`browser-task`) that accepts
+   free-form intent plus optional URL and credential hints. Runs an
+   LLM-in-the-loop planner over the browser primitives, using any
+   learned site knowledge from the skills / memory store as priming.
+   This is the default surface — the thing most callers should invoke.
+
+2. **Fast-path specialists.** A small set of narrow, structured
+   capabilities that do one well-defined thing cheaply and
+   deterministically. `fetch-page-title` and `extract-structured-data`
+   are specialists. New specialists are added only when usage shows a
+   consistent, high-volume pattern that benefits from a typed interface
+   (e.g. "get the product price from an e-commerce page" if that
+   genuinely becomes the 10%-of-all-calls shape — which it probably
+   won't).
+
+Most real callers are themselves LLM agents. They'll default to the
+generalist. Specialists exist to keep deterministic, programmatic
+callers cheap — not to proliferate.
+
+#### §5.2 Initial capability set (v0.2)
+
+| Capability | Tier | Description |
+|------------|------|-------------|
+| `browser-task` | Generalist | Given intent + optional URL/credential, plan and drive the browser to fulfill the intent. Uses RockBot skills + memory as priming knowledge. Returns structured result or intermediate artifact. |
+| `learn-form-schema` | Specialist (phase-1) | Given a URL (and optional credential to log in first), introspect a form and return its schema — fields, types, dropdown dependencies, validation rules. Persists the schema as a skill for later reuse. Returns the schema to the caller. |
+| `execute-form-batch` | Specialist (phase-2) | Given a previously-learned schema (by id or inline) and a batch of row data, submit the form once per row. Streams progress. Handles partial failure. |
+| `fetch-page-title` | Specialist | (Existing, milestone 2) Return the `<title>` of a URL. |
+| `extract-structured-data` | Specialist | (Existing, milestone 3) Extract structured data from a page matching a natural-language description. |
+
+The v0.1 `post-to-site` capability ships in the main codebase as a
+regression test for step-4 credential handling. It is not advertised in
+new agent-card skill lists after step 7; `browser-task` subsumes its
+function.
+
+`monitor-page` and `fill-form` from v0.1 §5.1 fold into `browser-task`.
+
+#### §5.3 Capabilities explicitly out of scope (v1)
+
+- Test automation (Playwright already does this).
+- Raw browser primitive exposure (Microsoft's `playwright/mcp` does this).
+- Visual regression testing.
+- Form-filling for sensitive financial transactions, account creation,
+  or modifying security permissions (see §7.3).
+- Multi-tab orchestration as a primary feature (may be used internally
+  but not advertised).
+- Code generation from browser traces (e.g. "generate a Playwright
+  script that reproduces this"). Traces stay inside the learning
+  substrate.
+
+> ~~Stagehand-style natural-language-to-action layers~~ — removed.
+> `browser-task` is that layer.
+
+#### §5.4 Capability design principles
+
+- **Task-level, not action-level.** Unchanged from v0.1.
+- **Clear contracts even for the generalist.** `browser-task`'s input
+  shape is typed (intent, url?, credentialId?, allowlist?, budget?);
+  only the *plan* inside is LLM-generated.
+- **Return structured state, not narrative, when the caller needs to
+  act on it.** A learned form schema is JSON, not prose. A submit-batch
+  progress report is a typed status update, not a sentence.
+- **Delegate to the learning substrate, don't reinvent it.** Site
+  knowledge lives in RockBot skills + memory; the capability reads and
+  writes, it doesn't own its own cache.
+- **Credentials by reference.** Unchanged.
+
+### §5.5 Multi-phase flows (new)
+
+Many real browser tasks are multi-phase with human review between
+phases. The canonical example (motivating this revision):
+
+1. **Phase 1 — Learn.** Navigate to a form; introspect its fields and
+   dynamic dependencies; return a schema to the caller.
+2. **Review.** The caller (human via Claude Code, or another agent)
+   inspects the schema, decides whether to proceed, assembles input
+   data, validates.
+3. **Phase 2 — Execute.** Submit the form N times against the learned
+   schema, streaming progress.
+
+Foragent's role is Phase 1 and Phase 3. Phase 2 (review) is the
+caller's responsibility — Foragent is not in the review loop.
+
+To make this work:
+
+- Phase-1 capabilities **return structured artifacts** (form schemas,
+  extracted data, observed flow traces), not just status text.
+- Phase-1 artifacts are **persisted in the learning substrate** (§5.6)
+  so Phase 3 doesn't re-learn. They get an id the caller can reference.
+- Phase-3 capabilities **accept a learned-artifact reference or inline
+  artifact** as input, alongside the per-invocation data.
+- Phase-3 capabilities **stream progress and handle partial failure**
+  over A2A — not batch-atomic.
+
+This is not an A2A protocol change. A2A 1.0 already supports structured
+response parts, streaming status updates, and task-id references. v0.2
+makes explicit use of all three.
+
+### §5.6 Learning substrate (new)
+
+Foragent uses the RockBot framework's existing persistence for learned
+site knowledge, rather than building a Foragent-local store.
+
+**What's used:**
+
+- **`ISkillStore` (file-backed, BM25 + optional semantic retrieval).**
+  Stores site knowledge as markdown skills. Two origin categories:
+  - **Human-authored skills** — operator-written primers for a site
+    (e.g. `sites/bsky.app/overview`). Treated as priming hints for the
+    generalist planner.
+  - **Agent-learned skills** — written by the generalist on successful
+    task completion (e.g. `sites/bsky.app/learned/login-flow`). Tagged
+    with `metadata.source = "agent-learned"` and an importance score.
+- **`ILongTermMemory` (file-backed, BM25 + semantic).** Declarative
+  observations that don't fit the procedural skill shape: failed
+  attempts, site-version notes, ambient facts ("bsky.app's home feed
+  heading is the login success signal").
+
+**What's stored (skill shape):**
+
+- **Content:** markdown body describing the site, the flow, selectors,
+  success signals, known pitfalls.
+- **Name convention:** `sites/{host}/{phase-or-intent}` — e.g.
+  `sites/bsky.app/login`, `sites/bsky.app/compose-post`. Hierarchical
+  `/` nesting supported by the store.
+- **`seeAlso` links** across skills for the same site, so retrieval
+  surfaces a small knowledge cluster rather than one skill at a time.
+
+**What's stored (memory shape for non-procedural facts):**
+
+- **Category:** `sites/{host}` — so all site observations are
+  retrievable together.
+- **Tags:** freeform (`selector`, `flow`, `failure`, `version`, etc.).
+- **Importance:** ranked 0–1. Confirmed-working patterns get high
+  importance; one-off observations start low and drift with reuse.
+
+**Retrieval pattern at plan time:**
+
+1. Generalist capability computes a search query from the task intent
+   + target URL host.
+2. Queries skill store and memory store in parallel, top-K by relevance.
+3. Retrieved content becomes priming context for the LLM planner.
+4. Planner proceeds; any new observation surfaces as a write after the
+   task completes.
+
+**Structured artifacts (the form-schema case):**
+
+Learned form schemas are typed JSON, not markdown. RockBot's skill store
+holds markdown content. Two options, decision deferred to step 8:
+
+- **(A)** Embed the JSON in a fenced code block inside the skill
+  content. Loose — re-parse on retrieval.
+- **(B)** Store the schema in an adjacent Foragent-local typed store,
+  reference it by id from a skill. Tighter but duplicates infrastructure.
+
+The framework-feedback log records (A) vs (B) as a candidate
+`ISkillStore.AttachedArtifacts` extension for RockBot, if we hit the
+shape often enough.
+
+### §5.7 Human-in-the-loop (new)
+
+Review gates are the **caller's** responsibility, not Foragent's.
+
+- Foragent returns structured state at phase boundaries (§5.5).
+- The caller decides whether to proceed. Human callers use their own UI
+  (Claude Code, Blazor proxy, bespoke dashboards). Agent callers make
+  the decision programmatically.
+- Foragent does **not** block waiting for review. Each phase is a
+  separate A2A task.
+
+A2A's `input-required` state is still used for credential 2FA prompts
+(§6.6). It is **not** used as a general "stop and let the human review"
+mechanism — that coupling would force Foragent to hold browser state
+across potentially-long human delays, which conflicts with the
+one-context-per-task model (§3.5).
+
+### §7.1 Domain allowlists (augmented)
+
+Under v0.2, allowlists become more load-bearing because the generalist
+can navigate anywhere the LLM plans to navigate. Every `browser-task`
+invocation:
+
+- **MUST** accept an explicit `allowedHosts` list (empty = reject).
+- **MUST** refuse any navigation, fetch, or subframe load outside the
+  list.
+- **SHOULD** have per-tenant defaults (future: §7.5) so individual tasks
+  can inherit rather than list everything.
+
+Ad-hoc "navigate to whatever looks relevant" is explicitly not
+supported. The generalist is powerful but bounded.
+
+### §9.1 Milestones (extended)
+
+Existing milestones 1–5: unchanged (shipped).
+
+6. **Baseline `browser-task` generalist.** LLM-in-the-loop planner over
+   existing browser primitives. No learning substrate yet. Measure
+   unaided success rate on a small curated benchmark (e.g. 10 varied
+   sites). Goal: establish the floor before investing in priming.
+
+7. **Wire RockBot skills + memory as priming.** Register
+   `ISkillStore` + `ILongTermMemory` in Foragent's host. Retrieve
+   relevant skills into planner context. Write agent-learned skills on
+   success. Goal: prove the framework's persistence surface is the
+   right substrate; file issues if it isn't.
+
+8. **`learn-form-schema` + `execute-form-batch`.** First explicit
+   multi-phase capability pair. Structured JSON schema returned from
+   phase 1, batch execution streaming progress in phase 2.
+
+9. **Deprecate narrow specialists that `browser-task` covers.** Remove
+   `post-to-site` from the advertised skill list (keep as regression
+   test). Review whether `fetch-page-title` / `extract-structured-data`
+   still pay their way or fold into `browser-task` with equivalent
+   cost. Goal: land on the minimum capability set that v0.2 actually
+   needs.
+
+### §9.2 Out of scope (v1, revised)
+
+Unchanged except:
+
+- ~~Stagehand-style natural-language-to-action layers~~ — **removed.**
+  `browser-task` is that layer.
+
+### §12 Open questions (revised)
+
+1. **Internal LLM selection and tier routing.** (Unchanged from v0.1.)
+2. ~~Stagehand-equivalent for .NET.~~ — **closed.** v0.2 builds it
+   natively on `Microsoft.Playwright` NuGet, not via Stagehand port or
+   `@playwright/mcp` sidecar. See Part 1 "What v0.2 is not."
+3. **Storage state encryption at rest.** (Unchanged.)
+4. **Capability versioning.** (Unchanged.)
+5. **Tenant identity model.** (Unchanged.)
+6. **(New) Structured artifacts in `ISkillStore`.** Do we stretch the
+   skill-as-markdown shape to carry typed JSON (fenced code blocks,
+   parse on retrieval), or add a parallel Foragent-local typed store?
+   Decide at step 8 based on how ugly (A) feels in practice.
+7. **(New) Per-task budget.** How do we cap an LLM-in-the-loop task —
+   max steps, max tokens, wall-clock, cost? Caller-specified, agent-
+   enforced, or both? Needed by step 6.
+8. **(New) Retry and failure semantics for batches.** In
+   `execute-form-batch`, is a row failure fatal or per-row? Does the
+   caller get per-row errors streamed, or a final report? Needed by
+   step 8.
+
+---
+
+## Part 3 — Implementation plan
+
+### Step 6 — Baseline `browser-task` generalist
+
+**Goal:** prove the LLM-in-the-loop-over-browser-primitives baseline
+works on real sites without learned priming. Establish the floor.
+
+**Deliverables:**
+- New `BrowserTaskCapability : ICapability` with skill id `browser-task`.
+- Typed input: `{intent: string, url: string?, credentialId: string?,
+  allowedHosts: string[], maxSteps: int?}`.
+- Pure .NET planner, no Node sidecar, no MCP transport. Built on
+  `Microsoft.Playwright` NuGet directly via a new `Foragent.Planner`
+  project that consumes `IBrowserPage` from `Foragent.Browser`.
+- Snapshot/action bridge: extend `IBrowserPage` (or add a sibling
+  `IBrowserPlannerPage`) with `AriaSnapshotAsync()` returning
+  ref-annotated aria text, plus `ResolveRefAsync("e42")` returning an
+  `ILocator`. Playwright already emits `[ref=eN]` markers in aria
+  snapshots and accepts `aria-ref=eN` in the selector engine — we're
+  exposing that, not reimplementing it.
+- Planner loop: snapshot → LLM selects next action → dispatch via ref →
+  repeat until the planner emits a terminal action or max-steps is hit.
+- LLM contract: a small `[AIFunction]` tool set — `snapshot`, `click`,
+  `type`, `navigate`, `wait_for`, `done`, `fail` — surfaced through
+  `IChatClient`'s native function-calling. No MCP JSON-RPC layer.
+- Per-task allowlist enforced on every `navigate` before Playwright
+  sees the URL.
+- Integration test: real Kestrel host with a fixed form; `browser-task`
+  fills it via free-text intent.
+- Framework-feedback update: what the planner loop wanted from the
+  framework that wasn't there.
+
+**Out of scope for step 6:**
+- Any learning / persistence.
+- Multi-phase / returned artifacts.
+- Credentials beyond what `IBrowserSession` already supports.
+
+**Exit criteria:**
+- `browser-task` completes the step-4 Bluesky poster flow end-to-end
+  against the step-4 fake Kestrel server, *without* any
+  Bluesky-specific code in Foragent's codebase (only the shared browser
+  primitives and the LLM planner).
+- Runs on 3+ more varied form shapes in tests.
+- `BlueskySitePoster` still passes its existing regression tests —
+  v0.2 does not break v0.1.
+
+### Step 7 — Learning substrate wired
+
+**Goal:** prove the framework's skills + memory is the right substrate
+for site knowledge, and that retrieval-primed generalist runs beat
+unaided runs.
+
+**Deliverables:**
+- `builder.WithSkills()` + `builder.WithLongTermMemory()` added to
+  `Foragent.Agent/Program.cs`.
+- `/data/foragent` volume in `docker-compose.yml` with
+  `AgentProfile__BasePath=/data/foragent`.
+- `BrowserTaskCapability` queries both stores pre-plan; retrieved
+  content primes the planner. Query shape: intent + target host +
+  top-K.
+- On task success, planner writes one skill per distinguishable flow
+  (login / action / success signal) keyed by host, tagged
+  `metadata.source=agent-learned`.
+- `IEmbeddingGenerator` wired (Azure OpenAI text-embedding-3-small or
+  similar) for semantic retrieval. Falls back to BM25-only if not
+  configured.
+- Seed one human-authored skill for `bsky.app` as a priming example;
+  check in as `deploy/skills-seed/sites/bsky.app/overview.md`.
+- Integration test: cold run vs. primed run; assert primed run uses
+  fewer LLM steps.
+
+**Framework observations to capture:**
+- Does `ISkillStore`'s markdown-content shape fit procedural site
+  knowledge, or does it strain?
+- Does memory's category/tag/importance model fit site observations?
+- Any gaps in retrieval (e.g. no host-prefix query shape) → file
+  rockbot issues.
+
+**Exit criteria:**
+- Primed `browser-task` runs on the same task consistently use ≥30%
+  fewer planner LLM calls than the unprimed baseline from step 6.
+- Agent-learned skills are readable and actionable when inspected by a
+  human.
+
+### Step 8 — Multi-phase: form learn + batch execute
+
+**Goal:** first-class support for the motivating scenario — introspect
+a form, return a reviewable schema, later submit a batch against it.
+
+**Deliverables:**
+- New `LearnFormSchemaCapability` with skill id `learn-form-schema`.
+  - Input: `{url, credentialId?, allowedHosts}`.
+  - Output: typed JSON schema — fields (name, type, visibility
+    rules), dropdown options, dependency graph, submit button locator.
+  - Persists the schema alongside a skill (open question #6 — decide
+    at step start).
+- New `ExecuteFormBatchCapability` with skill id `execute-form-batch`.
+  - Input: `{url, credentialId?, schemaId | schema, rows[],
+    allowedHosts, onError: "abort"|"continue"}`.
+  - Streams A2A status updates per row.
+  - Returns a per-row result array on completion.
+- Integration test: Kestrel-hosted form with a dynamic dropdown
+  (e.g. `category=alpha` reveals fields A/B, `category=beta` reveals
+  fields C/D). Schema round-trips; batch of 20 mixed rows submits.
+- Open question #8 decided (in the deliverable, not as prose):
+  per-row continue-vs-abort, progress shape.
+
+**Exit criteria:**
+- Schema learned in one task, batch submitted in a separate task
+  (different process invocations) against the persisted schema.
+- Schema is human-reviewable: a developer can read it and understand
+  what Foragent will submit before consenting.
+
+### Step 9 — Deprecate subsumed specialists
+
+**Goal:** land on v0.2's actual advertised capability set.
+
+**Deliverables:**
+- `post-to-site` removed from advertised `ForagentCapabilities.Skills`
+  and from `deploy/rockbot-seed/well-known-agents.json` and
+  `agent-trust.json` `approvedSkills`. Implementation stays in the
+  codebase as a regression test for credential handling; integration
+  tests remain green.
+- `monitor-page` and `fill-form` from v0.1 §5.1 never shipped; remove
+  from spec.
+- Review: do `fetch-page-title` and `extract-structured-data` still
+  pay their way? Measure runtime cost vs. equivalent `browser-task`
+  calls. Remove if `browser-task` is competitive; keep if they're 10×+
+  cheaper on the hot path.
+- Spec v0.2 merged into `foragent-specification.md`; this proposal
+  file archived to `docs/archive/`.
+
+**Exit criteria:**
+- Advertised capability list matches §5.2 of spec v0.2.
+- No codepath is exercised only by deprecated specialists — every
+  line has a live caller or a live test.
+
+---
+
+## Part 4 — Open questions for you before step 6 starts
+
+1. **Generalist action set.** Start with `{snapshot, navigate, click,
+   type, wait_for, done, fail}` (aligning with what `@playwright/mcp`
+   exposes and what Playwright's ref-resolver supports natively), or
+   broader (`hover, select, keyboard_shortcut, file_upload`)? I'd start
+   narrow and grow on demand. **Your call?**
+
+2. **Planner LLM.** Use the same `IChatClient` as
+   `extract-structured-data` (Azure AI Foundry `gpt-5.3-chat`), or
+   wire a separate one? Separate would let us route planner ≠
+   extraction cost-optimally. I'd start same, split if cost forces it.
+
+3. **Per-task budget default.** Propose: `maxSteps=30`,
+   `maxSeconds=120`, caller can raise within bounds. **OK, or do you
+   want these higher/lower?**
+
+4. **Allowlist default.** Refuse navigation if `allowedHosts` is empty,
+   or treat empty as "same-origin as `url`"? I lean refuse — forces
+   callers to be explicit, cheap to construct.
+
+5. **RockBot side.** Foragent's `invoke_agent` experience at step 5
+   showed RockBot's tool only passes free-text. Does step 6's
+   `browser-task` fit that shape naturally (intent *is* free text), so
+   the problem dissolves? I think yes — worth confirming before
+   building.
+
+6. **Spec merge timing.** Merge this proposal into the main spec now
+   (it becomes the v0.2 spec and the project operates under it), or
+   keep it as a proposal until step 6 validates the core approach?
+   I lean: merge §5 + §9 now, leave §5.6 / §5.7 as "proposed" until
+   step 7 actually exercises them.
diff --git a/docs/foragent-specification.md b/docs/foragent-specification.md
index f67100e..4d105d2 100644
--- a/docs/foragent-specification.md
+++ b/docs/foragent-specification.md
@@ -1,30 +1,39 @@
 # Foragent — Project Specification
 
-> **Status:** Design specification, pre-implementation.
+> **Status:** Governing specification, v0.2.
 > **Date:** April 2026
 > **Author:** Rocky Lhotka, Marimer LLC
-> **Repository (planned):** https://github.com/MarimerLLC/Foragent
+> **Repository:** https://github.com/MarimerLLC/foragent
 
 ---
 
 ## 1. Summary
 
-**Foragent** is an A2A-native browser agent for .NET. It exposes browser
-automation capabilities — navigate, extract, fill forms, post to sites,
-monitor pages — over the Agent2Agent (A2A) protocol. Other agents delegate
-browser work to Foragent rather than reasoning about DOM selectors, session
-state, or 2FA flows themselves.
+**Foragent** is an A2A-native, self-hosted **agentic browser agent** for
+.NET. Callers delegate free-form browser intent to Foragent — "submit
+these rows to this form," "post this content on this site," "extract
+this data from these pages" — and Foragent plans and drives the browser
+to fulfill it, using internal LLM reasoning to resolve selectors,
+interpret dynamic form structure, and recover from failure. Callers
+do not reason about DOM, selectors, session state, or 2FA.
 
 Foragent is built on the **RockBot framework** (the NuGet packages
-maintained at https://github.com/MarimerLLC/rockbot) and uses the official
-**Microsoft.Playwright** NuGet package for browser automation. It is the
-second consumer of the RockBot framework, after the RockBot personal agent
-itself.
+maintained at https://github.com/MarimerLLC/rockbot) and uses the
+official **Microsoft.Playwright** NuGet package for browser automation,
+driven directly in-process (no MCP sidecar, no Stagehand port — see
+Appendix A decision #16). It is the second consumer of the RockBot
+framework, after the RockBot personal agent itself.
+
+Foragent's product is **one generalist capability** (`browser-task`)
+that handles the long tail of browser work, complemented by a small set
+of narrow fast-path specialists where a structured typed interface pays
+for itself. Site-specific code is the exception, not the scaling path
+(see §5).
 
 Foragent is a standalone open-source project under Marimer LLC. RockBot
-is its first user, but the project is designed to be generally useful to
-anyone building agentic systems on .NET that need a self-hosted browser
-worker.
+is its first user, but the project is designed to be generally useful
+to anyone building agentic systems on .NET that need a self-hosted
+browser worker.
 
 ---
 
@@ -168,6 +177,31 @@ as the base image. The agent process calls Microsoft.Playwright directly.
 - Concurrency-within-pod can be added later if profiling shows it's
   needed.
 
+### 3.7 LLM tier routing
+
+Foragent uses the RockBot framework's `TieredChatClientRegistry`
+(`RockBot.Llm`, exposed via `AddRockBotTieredChatClients`). The registry
+provides three `IChatClient` instances — `Low`, `Balanced`, `High` —
+and registers the `Balanced` client as the default `IChatClient`
+singleton for consumers that inject without a tier hint.
+
+- **Capabilities request a tier appropriate to the work.** The generalist
+  planner loop (§5.2) targets `Balanced` for planning steps and may
+  request `High` for recovery from ambiguous states or complex reasoning.
+  Cheap structural operations (aria-snapshot summarization, extraction
+  shaping) target `Low` when a Low model is meaningfully cheaper.
+- **For v0.2 Foragent ships with one configured model.** Operators wire
+  the same `IChatClient` into all three tiers; future cost-optimization
+  upgrades swap models per-tier without touching capability code.
+- **Consumers that inject `IChatClient` directly continue to work.** They
+  transparently receive the `Balanced` client — the framework guarantees
+  this. No capability is required to be tier-aware; it's an opt-in
+  optimization surface.
+
+Direct injection of a single `IChatClient` (as used by v0.1's
+`ExtractStructuredDataCapability`) remains supported and backwards-
+compatible with the tiered registration.
+
 ---
 
 ## 4. Project structure
@@ -250,40 +284,162 @@ MIT license. Matches CSLA and the broader .NET OSS ecosystem.
 The capability list is the product. Foragent's value is what verbs it
 exposes via A2A, not what's inside.
 
-### 5.1 Initial capability set (v0.x)
-
-Start narrow. Add only when usage demands it.
-
-| Capability | Description |
-|------------|-------------|
-| `fetch-page-content` | Navigate to a URL, return rendered text and structured page metadata. |
-| `extract-structured-data` | Navigate to a URL, extract data matching a description (e.g. "the product price and availability"). |
-| `fill-form` | Navigate to a URL, fill out a form given a description of the values, submit. |
-| `post-to-site` | Authenticate against a configured site (using credential broker) and post content. First targets: Bluesky, Mastodon. |
-| `monitor-page` | Periodically check a page for changes matching a description; emit A2A progress updates when changes occur. |
-
-### 5.2 Capabilities explicitly out of scope (v1)
-
-- Test automation (Playwright already does this)
-- Raw browser primitive exposure (Microsoft's playwright/mcp does this)
-- Visual regression testing
-- Form-filling for sensitive financial transactions or account creation
-  (see Section 7.3)
-- Multi-tab orchestration as a primary feature (may be supported
-  internally but not advertised as a capability)
-
-### 5.3 Capability design principles
-
-- Each capability has a **clear, named contract** — inputs, outputs,
-  error modes documented.
-- Capabilities are **task-level, not action-level**. "Post to site" is
-  a capability; "click button" is not.
-- Capabilities **may delegate to internal LLM reasoning** for
-  selector resolution, intent translation, and retry logic. This is
-  what makes Foragent an *agent* rather than a wrapper.
-- Capabilities **respect the credential broker contract**. They
-  reference credentials by ID; they never receive raw values from
-  callers.
+### 5.1 Capability model
+
+Foragent exposes capabilities at two tiers:
+
+1. **Generalist.** One capability — `browser-task` — that accepts
+   free-form intent plus optional URL and credential hints. Runs an
+   LLM-in-the-loop planner over the browser primitives, using any
+   learned site knowledge from the skills and memory stores (§5.6) as
+   priming. This is the default surface — the thing most callers should
+   invoke.
+2. **Fast-path specialists.** A small set of narrow, structured
+   capabilities that do one well-defined thing cheaply and
+   deterministically. `fetch-page-title` and `extract-structured-data`
+   are specialists. New specialists are added only when usage shows a
+   consistent, high-volume pattern that benefits from a typed interface.
+
+Most real callers are themselves LLM agents. They default to the
+generalist. Specialists exist to keep deterministic, programmatic
+callers cheap — not to proliferate.
+
+### 5.2 Initial capability set (v0.2)
+
+| Capability | Tier | Description |
+|------------|------|-------------|
+| `browser-task` | Generalist | Given intent + optional URL, credential id, and allowed-hosts list, plan and drive the browser to fulfill the intent. Uses RockBot skills + memory as priming. Returns a result or a structured intermediate artifact (e.g. a learned form schema). |
+| `learn-form-schema` | Specialist (phase-1) | Given a URL and optional credential, introspect a form and return its schema — fields, types, dropdown dependencies, validation rules. Persists the schema as a skill (§5.6). Returns the schema to the caller for review. |
+| `execute-form-batch` | Specialist (phase-2) | Given a learned schema (by id or inline) and a batch of row data, submit the form once per row. Streams A2A progress updates. Handles partial failure. |
+| `fetch-page-title` | Specialist | Return the `<title>` of a URL. Inherited from milestone 2. |
+| `extract-structured-data` | Specialist | Extract structured data from a page matching a natural-language description. Inherited from milestone 3. |
+
+The v0.1 `post-to-site` capability ships in the main codebase as a
+regression test for credential handling. After step 7 it is removed
+from the advertised skill list; `browser-task` subsumes its function.
+
+The v0.1 `monitor-page` and `fill-form` capabilities fold into
+`browser-task` and do not ship as separate advertised skills.
+
+### 5.3 Capabilities explicitly out of scope (v1)
+
+- Test automation (Playwright already does this).
+- Raw browser primitive exposure (Microsoft's `@playwright/mcp` does
+  this; Foragent operates one level up — task-shaped, not tool-shaped).
+- Visual regression testing.
+- Form-filling for sensitive financial transactions, account creation,
+  or modifying security permissions (see §7.3).
+- Multi-tab orchestration as a primary feature (may be used internally
+  but not advertised).
+- Code generation from browser traces (e.g. "generate a Playwright
+  script that reproduces this"). Traces stay inside the learning
+  substrate.
+
+### 5.4 Capability design principles
+
+- **Task-level, not action-level.** "Submit these rows to that form"
+  is a capability; "click button" is not.
+- **Clear contracts even for the generalist.** `browser-task`'s input
+  shape is typed (intent, url?, credentialId?, allowedHosts, maxSteps?);
+  only the *plan* inside is LLM-generated.
+- **Return structured state, not narrative, when the caller needs to
+  act on it.** A learned form schema is typed JSON, not prose. A
+  submit-batch progress report is a typed status update, not a sentence.
+- **Delegate to the learning substrate, don't reinvent it.** Site
+  knowledge lives in RockBot skills + memory; the capability reads and
+  writes, it does not own its own cache.
+- **Credentials by reference.** Capabilities receive a credential id;
+  the broker (§6) resolves inside the Foragent process.
+
+### 5.5 Multi-phase flows
+
+Many real browser tasks are multi-phase with human or caller-side
+review between phases. The motivating example:
+
+1. **Phase 1 — Learn.** Navigate to a form; introspect its fields and
+   dynamic dependencies; return a schema to the caller.
+2. **Review.** The caller (human via their own UI, or another agent)
+   inspects the schema, decides whether to proceed, assembles input
+   data, validates.
+3. **Phase 2 — Execute.** Submit the form N times against the learned
+   schema, streaming progress.
+
+Foragent's role is Phase 1 and Phase 3. Phase 2 (review) is the
+caller's responsibility — Foragent is not in the review loop.
+
+To make this work:
+
+- Phase-1 capabilities **return structured artifacts** (form schemas,
+  extracted data, observed flow traces), not just status text.
+- Phase-1 artifacts are **persisted in the learning substrate** (§5.6)
+  and get an id the caller can reference in Phase 3.
+- Phase-3 capabilities **accept a learned-artifact reference or inline
+  artifact** as input, alongside per-invocation data.
+- Phase-3 capabilities **stream progress and handle partial failure**
+  over A2A — not batch-atomic.
+
+This is not an A2A protocol change. A2A 1.0 already supports structured
+response parts, streaming status updates, and task-id references; v0.2
+makes explicit use of all three.
+
+### 5.6 Learning substrate
+
+Foragent uses the RockBot framework's existing persistence for learned
+site knowledge, rather than building a Foragent-local store.
+
+**What's used:**
+
+- **`ISkillStore`** (file-backed, BM25 + optional semantic retrieval —
+  `RockBot.Host.Abstractions` + `RockBot.Host.AgentMemoryExtensions.WithSkills()`).
+  Stores site knowledge as markdown skills. Two origin categories:
+  - **Human-authored skills** — operator-written primers for a site
+    (e.g. `sites/bsky.app/overview`). Treated as priming hints for the
+    generalist planner.
+  - **Agent-learned skills** — written by the generalist on successful
+    task completion (e.g. `sites/bsky.app/learned/login-flow`). Tagged
+    with `metadata.source = "agent-learned"` and an importance score.
+- **`ILongTermMemory`** (file-backed, BM25 + semantic —
+  `WithLongTermMemory()`). Declarative observations that don't fit the
+  procedural skill shape: failed attempts, site-version notes, ambient
+  facts.
+
+**Skill naming:** `sites/{host}/{phase-or-intent}` — e.g.
+`sites/bsky.app/login`, `sites/bsky.app/compose-post`. Hierarchical `/`
+nesting is supported by the store. `seeAlso` links cross-reference
+skills for the same site so retrieval surfaces a small knowledge
+cluster, not one skill at a time.
+
+**Retrieval at plan time:**
+
+1. Capability computes a search query from task intent + target URL host.
+2. Queries skill store and memory store in parallel, top-K by relevance.
+3. Retrieved content becomes priming context for the LLM planner.
+4. New observations surface as writes after the task completes.
+
+**Structured artifacts (the form-schema case):**
+
+Learned form schemas are typed JSON, not markdown. Skill store holds
+markdown content. Resolution deferred to step 8; current options are
+(A) embed JSON in a fenced code block inside a skill, re-parse on
+retrieval, or (B) add a parallel Foragent-local typed store keyed by
+skill id. Framework-feedback tracks this as a candidate
+`ISkillStore.AttachedArtifacts` extension if the shape recurs.
+
+### 5.7 Human-in-the-loop
+
+Review gates are the **caller's** responsibility, not Foragent's.
+
+- Foragent returns structured state at phase boundaries (§5.5).
+- The caller decides whether to proceed. Human callers use their own
+  UI; agent callers make the decision programmatically.
+- Foragent does **not** block waiting for review. Each phase is a
+  separate A2A task.
+
+A2A's `input-required` state is used only for mid-task credential
+flows (2FA, §6.6). It is not used as a general "stop and let the human
+review" mechanism — that coupling would force Foragent to hold browser
+state across potentially-long human delays, which conflicts with the
+one-context-per-task model (§3.5).
 
 ---
 
@@ -385,9 +541,24 @@ flow is the recommended pattern.
 
 ### 7.1 Domain allowlists
 
-Per-task allowlists for navigable domains. The calling agent can
-constrain a task to specific origins; Foragent refuses navigation
-outside the allowlist. Default is restrictive, not permissive.
+Every capability invocation — especially the generalist `browser-task`
+(§5.2) — **must** carry an explicit allowed-hosts list. Empty list
+**rejects** the task; there is no default-permissive mode.
+
+Wildcards are supported to keep callers from having to enumerate every
+subdomain:
+
+- Exact host: `bsky.app`
+- Subdomain wildcard: `*.example.com` (matches `foo.example.com`,
+  `foo.bar.example.com`; does not match `example.com` itself — list
+  both if both are desired).
+- Fully unrestricted: `*` (explicit only; still callable, still logged).
+
+Foragent refuses any navigation, fetch, or subframe load outside the
+list before Playwright sees the URL. Per-tenant defaults (future, §7.5)
+will let individual tasks inherit rather than list everything on every
+call. Ad-hoc "navigate to whatever looks relevant" is explicitly not
+supported — the generalist is powerful but bounded.
 
 ### 7.2 Network egress policies
 
@@ -505,42 +676,65 @@ hard design questions until usage forces them.
 
 ### 9.1 Milestones
 
-1. **Empty agent on RockBot framework.** Stand up Foragent.Agent that
-   registers itself as an A2A server with one trivial capability
-   (`fetch-page-title`). No Playwright yet. Goal: feel the bootstrap
-   cost of building a new agent on RockBot.
-
-2. **Real Playwright integration for that capability.** Add
-   Microsoft.Playwright NuGet, implement `fetch-page-title` for real
-   against actual web pages. Goal: feel the integration story between
-   RockBot's agent loop and the Playwright library.
-
-3. **Add a second capability** (`extract-structured-data`). Goal: feel
-   how the framework supports growing the capability surface.
-
-4. **Add credentials and a third capability that needs them**
-   (`post-to-site` for Bluesky). Goal: end-to-end credential broker
-   story including ICredentialBroker abstraction and at least one
-   real implementation.
-
-5. **Wire RockBot the agent up to call Foragent via A2A.** Goal:
-   validate the full loop. RockBot becomes Foragent's first real user.
-
-Each milestone produces framework feedback. Capture it. Some will be
-small ergonomic fixes; some may be "the framework should really have a
-concept of X."
+**Steps 1–5 — shipped (v0.1):**
+
+1. **Empty agent on RockBot framework.** `fetch-page-title` with no
+   Playwright.
+2. **Real Playwright integration for that capability.**
+3. **Second capability** — `extract-structured-data` (Playwright + LLM).
+4. **Credentials and `post-to-site` for Bluesky.** `ICredentialBroker`
+   + `InMemoryCredentialBroker` + `BlueskySitePoster`.
+5. **RockBot wired to Foragent via A2A.** Validation loop; RockBot
+   becomes Foragent's first real user.
+
+**Steps 6–9 — v0.2 sequence:**
+
+6. **Baseline `browser-task` generalist.** LLM-in-the-loop planner built
+   directly on `Microsoft.Playwright` NuGet (no MCP sidecar, no
+   Stagehand — see Appendix A #16). Exposes a small `[AIFunction]`
+   tool set — `snapshot`, `click`, `type`, `navigate`, `wait_for`,
+   `done`, `fail` — through `IChatClient`. Uses `Page.AriaSnapshotAsync()`
+   ref-annotated snapshots and `Page.Locator("aria-ref=eN")` for ref
+   resolution. No learning substrate yet. Measure unaided success rate
+   on a small curated benchmark. Goal: establish the floor before
+   investing in priming.
+
+7. **Wire RockBot skills + memory as priming.** Register `ISkillStore`
+   + `ILongTermMemory` in Foragent's host. Retrieve relevant skills
+   into planner context; write agent-learned skills on success. Seed
+   one human-authored skill for `bsky.app`. Wire `IEmbeddingGenerator`
+   for semantic retrieval. Remove `post-to-site` from the advertised
+   skill list once `browser-task` + the learned bsky skill cover it.
+   Goal: prove the framework's persistence is the right substrate;
+   file issues if it isn't.
+
+8. **`learn-form-schema` + `execute-form-batch`.** First explicit
+   multi-phase capability pair. Structured JSON schema returned from
+   phase 1, batch execution with streaming per-row progress in phase 2.
+   Resolve open question #6 (how to persist typed JSON alongside
+   markdown skills) in the deliverable.
+
+9. **Deprecate subsumed specialists.** Review whether `fetch-page-title`
+   / `extract-structured-data` still pay their way or fold into
+   `browser-task` with equivalent cost. Land on the minimum advertised
+   capability set v0.2 actually needs.
+
+Each milestone produces framework feedback. Capture it in
+`docs/framework-feedback.md` — some will be small ergonomic fixes; some
+may be "the framework should really have a concept of X."
 
 ### 9.2 What is explicitly out of scope for v1
 
-- Container packaging beyond a single working Dockerfile
-- Helm charts and production k8s manifests
-- KEDA autoscaling integration
-- Multi-tenant credential broker UIs
-- Agent self-improvement / learning
-- Browser pool management
-- Stagehand-style natural-language-to-action layers (may be revisited
-  later; the internal LLM-based selector resolution is sufficient for
-  v1)
+- Container packaging beyond a single working Dockerfile.
+- Helm charts and production k8s manifests.
+- KEDA autoscaling integration.
+- Multi-tenant credential broker UIs.
+- Browser pool management (single shared Chromium per pod — §3.5).
+- Non-browser automation (desktop, mobile, API-only flows).
+
+(The v0.1 "no Stagehand-style natural-language-to-action layers" item
+is deliberately removed. v0.2's `browser-task` *is* that layer, built
+natively on Playwright NuGet — see Appendix A #16.)
 
 ---
 
@@ -600,13 +794,12 @@ identifier for .NET.
 
 These are real design questions deferred until usage forces an answer.
 
-1. **Internal LLM selection and tier routing.** Foragent will use
-   Microsoft.Extensions.AI for internal reasoning. Which tier routing
-   patterns from RockBot apply directly, and which are RockBot-specific?
-2. **Stagehand-equivalent for .NET.** Stagehand is Node-only. Should
-   Foragent build an equivalent natural-language `page.act()` layer in
-   C# using its internal LLM? Defer to v2 unless v1 selector-resolution
-   proves insufficient.
+1. ~~Internal LLM selection and tier routing.~~ **Closed** in v0.2
+   §3.7 — Foragent uses RockBot's `TieredChatClientRegistry`; ships
+   with one model aliased across tiers; capabilities are tier-aware.
+2. ~~Stagehand-equivalent for .NET.~~ **Closed** in v0.2 — built
+   natively on `Microsoft.Playwright` NuGet; no Stagehand port, no
+   `@playwright/mcp` sidecar. See Appendix A #16.
 3. **Storage state encryption at rest.** Storage state is sensitive but
    not as sensitive as raw credentials. Does it need stronger protection
    than the credential broker provides, or is broker-level fine?
@@ -615,6 +808,17 @@ These are real design questions deferred until usage forces an answer.
    Defer until a capability actually needs to change shape.
 5. **Tenant identity model.** A2A 1.0-preview's identity model is still
    evolving. Lock in the tenant identity story once A2A 1.0 stabilizes.
+6. **Structured artifacts in `ISkillStore`.** Learned form schemas
+   (§5.6) are typed JSON; skills store markdown. Stretch the skill
+   shape (fenced JSON, re-parse on retrieval) or add a parallel
+   Foragent-local typed store keyed by skill id? Decide at step 8.
+7. **Per-task budget.** How do we cap an LLM-in-the-loop task — max
+   steps, max tokens, wall-clock, cost? Proposed defaults:
+   `maxSteps=30`, `maxSeconds=120`, caller can raise within bounds.
+   Needed by step 6.
+8. **Retry and failure semantics for batches.** In `execute-form-batch`,
+   is a row failure fatal or per-row? How are partial results streamed?
+   Needed by step 8.
 
 ---
 
@@ -640,3 +844,8 @@ need to be revisited.
 | 13 | MIT license | Matches CSLA and the broader .NET OSS ecosystem. |
 | 14 | .NET 10, C# latest | Current stable .NET as of project start. |
 | 15 | Name: Foragent | Distinctive, self-explaining, available domains, no dev-tools collision. |
+| 16 | Build generalist `browser-task` on Microsoft.Playwright NuGet directly — no Stagehand port, no `@playwright/mcp` sidecar | Ref-annotated aria snapshots and `aria-ref=eN` locator resolution are Playwright features, not MCP-exclusive. `[AIFunction]` tool wrapping over `IChatClient` gives MCP-equivalent function-calling in-process. Keeps credential boundary (§6.1) clean and preserves v0.1 decision #1. |
+| 17 | Use RockBot's `TieredChatClientRegistry` (Low/Balanced/High) with Balanced as the injected default | Future cost-optimization can route cheaper classes of work (extraction, snapshot summarization) to Low without capability rewrites. v0.2 ships with one model aliased across tiers. |
+| 18 | Allowlists are mandatory per-task with wildcard support (`*.example.com`, `*`) | Generalist LLM-in-the-loop planner has much wider blast radius than fixed-flow specialists; empty list must reject. Wildcards keep callers from enumerating subdomains. |
+| 19 | Learned site knowledge lives in RockBot's `ISkillStore` + `ILongTermMemory`, not a Foragent-local store | Framework-owned persistence is already packable, DI-registerable, and has BM25+semantic hybrid retrieval with importance weighting. Building parallel infrastructure would be duplicate work and would miss the framework-validation goal (§8). |
+| 20 | Multi-phase flows (learn → review → execute) are expressed as separate A2A tasks, not one long-running task with `input-required` | Review gates are the caller's concern; Foragent would otherwise have to hold browser state across arbitrary human delays, breaking the one-context-per-task isolation model (§3.5). |