Step 9: deprecate subsumed specialists (v0.2 complete) by rockfordlhotka · Pull Request #14 · MarimerLLC/foragent

rockfordlhotka · 2026-04-23T04:57:54Z

Summary

Remove fetch-page-title and extract-structured-data from the advertised capability set. Both are reachable via browser-task — page-title as a trivial intent, structured extraction via a "return JSON: {…}" instruction carried in the planner's done(result=…) channel. Cost delta is ~2–3× tokens per call, acceptable given zero deterministic high-volume callers today. extract-structured-data was also out of spec on §7.1 — it called the no-argument CreateSessionAsync overload and accepted any host; the generalist enforces allowlists by design.
Land on the minimum v0.2 surface: three skills — browser-task, learn-form-schema, execute-form-batch (spec §5.2 updated, §9.1 step 9 marked shipped).
Prune orphaned Browser surface. IBrowserSession.FetchPageTitleAsync / CapturePageSnapshotAsync / PageSnapshot / PageSnapshotSource had no remaining callers; deleted. CapabilityInput.Parse shared URL/description shim had no remaining callers; deleted (BrowserTaskInput and Forms/*Input.cs handle their own shapes). Trim StubBrowserSessionFactory + FakeAgentBrowserSession to match. Version bumped 0.2.0-alpha.8 → 0.2.0-alpha.9.
Framework-feedback step-9 section captures the observation that capability-surface evolution was painless on the current IAgentTaskHandler + DI-resolved capabilities shape — confirming foragent#5 / rockbot#283 (per-skill handler registration) is quality-of-life, not a blocker.

Test plan

dotnet build --configuration Release — clean, 0 warnings, 0 errors
dotnet test --configuration Release — 48 passed / 3 LLM-gated skipped (46 Agent unit tests + 1 FormCapabilitiesIntegrationTests + 1 Foragent.Integration.Tests placeholder; skipped are the 3 BrowserTaskIntegrationTests that require FORAGENT_LLM_*)
agent-card skill list verified in deploy/rockbot-seed/well-known-agents.json + ForagentCapabilities.Skills — both now list the three v0.2 skills
docker-compose + curl smoke against the new browser-task + example.com smoke example in the comment block (operator run — not gated)
real-LLM check that browser-task emits valid JSON in done.result when the intent asks for it (operator run — deferred, would invalidate the decision if it fails)

Known limitations

browser-task's done.result channel is not schema-enforced the way extract-structured-data's ResponseFormat = Json was. Callers asking for structured extraction should include the target JSON shape verbatim in the intent. If high-volume deterministic extraction callers ever appear, resurrect a specialist with the benefit of actual usage data.
Spec open-questions Step 2: real Playwright integration for fetch-page-title #3, Step 3: extract-structured-data + capability dispatch refactor #4, Migrate capabilities to RockBot.A2A IAgentSkillHandler once MarimerLLC/rockbot#283 ships #5, Step 5: RockBot as Foragent's first real user #7 remain as written (storage-state encryption, capability versioning, tenant identity, per-task budget tuning). Initial repo setup: solution structure, projects, CI, and docs #1, Step 1: empty agent on RockBot framework #2, Step 4: credentials + post-to-site (Bluesky) #6, Adopt spec v0.2: agentic generalist direction #8 closed earlier in v0.2.

🤖 Generated with Claude Code

Remove fetch-page-title and extract-structured-data from the advertised skill set. Both are reachable via browser-task — page-title as a trivial intent, structured extraction via a "return JSON: {...}" instruction carried in the planner's done(result=...) channel. Cost delta is 2-3x tokens per call, acceptable given zero deterministic high-volume callers today. extract-structured-data was also out of spec on §7.1 — it called the no-argument CreateSessionAsync overload and accepted any host. The generalist enforces allowlists by design. Advertised v0.2 surface lands at three skills: browser-task, learn-form-schema, execute-form-batch. - Delete FetchPageTitleCapability, ExtractStructuredDataCapability, and the shared CapabilityInput URL/description parser (no other consumers). browser-task has its own BrowserTaskInput; form capabilities have their own input classes. - Delete the session-level one-shot helpers that only the removed specialists used: IBrowserSession.FetchPageTitleAsync, CapturePageSnapshotAsync, PageSnapshot, PageSnapshotSource. - Delete the corresponding tests — 7 unit tests for the capabilities, the PlaywrightBrowserSessionTests + PageSnapshotTests integration suites, and the ExtractStructuredDataIntegrationTests real-LLM benchmark. BrowserTaskIntegrationTests remains the real-LLM surface. - Trim StubBrowserSessionFactory + FakeAgentBrowserSession to match the pruned IBrowserSession. Update metadata: deploy/rockbot-seed/*.json, docker-compose.yml description + curl smoke example, .env.example comments, Program.cs comment, docs/capabilities.md, spec §5.2 capability table and §9.1 step-9 description, CLAUDE.md Status + Browser + Capabilities sections, framework-feedback step-9 section. Version bumped 0.2.0-alpha.8 → 0.2.0-alpha.9. Tests: 48 passed (46 Agent unit + 1 Forms integration + 1 placeholder), 3 real-LLM BrowserTaskIntegrationTests skipped as expected. Build clean on Release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Running RockBot → Foragent with a real browser-task (MacBook-price search across apple.com + bestbuy.com) surfaced three pre-existing issues that blocked step 9's claimed end-to-end validation. Fixing them here on the step-9 branch keeps the PR's test plan honest. 1. BrowserTaskPriming required IEmbeddingGenerator (DI resolution bug) The primary-constructor parameter was annotated nullable (IEmbeddingGenerator<string, Embedding<float>>?), but MSDI ignores C# nullable annotations — it only honors default parameter values. Reordered to put embeddingGenerator last with = null so MSDI treats it as optional. Spec §5.6 says missing embeddings should downgrade to BM25-only retrieval; that claim is now actually true. Two test callers updated to drop the explicit embeddingGenerator: null arg. 2. Skill names with dotted hosts failed silently RockBot 0.9's FileSkillStore.ValidateName rejects '.' — every real host (bsky.app, apple.com, example.com) threw ArgumentException on save. BskySeedSkillService swallowed the throw as a startup warning, TryWriteLearnedSkillAsync swallowed it on the error path, and form schemas just never persisted. Added SkillNaming.SanitizeHost that replaces '.' → '-' (bsky.app → bsky-app) and applied it at three call sites: BskySeedSkillService, BrowserTaskCapability. TryWriteLearnedSkillAsync, LearnFormSchemaCapability.DeriveSkillName. Allowlist matching and memory-search categories keep the original dotted host — only skill names need sanitization. Test assertions (BrowserTaskCapabilityTests, BskySeedSkillServiceTests, LearnFormSchemaCapabilityTests) updated to the sanitized names; skill-optimize.md directive examples updated so the dream loop produces valid names. 3. Fresh named volume masks Dockerfile chown The Foragent Dockerfile chowns /data to the non-root foragent user (uid 1655) at image-build time, but Docker mounts a fresh named volume root-owned, masking the build-time chown. Added a foragent-init busybox one-shot (mirroring rockbot-init) that chmod -R 777 /data/foragent on volume creation. Docs updated: CLAUDE.md Status + Learning-substrate sections, docs/capabilities.md, spec §5.6 skill-naming paragraph (calls out the sanitization rule), framework-feedback step-9 follow-up section with three framework observations (MSDI nullable footgun, validator's dot rejection making real hosts fail, named-volume permissions pattern). Tests: 48 passed / 3 LLM-gated skipped. End-to-end smoke: RockBot dispatches browser-task to Foragent over the bus; Foragent plans 2 steps (navigate + snapshot), emits done with JSON result, reply lands on user.response.RockBot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When RockBot's LLM called invoke_agent with free-form prose (message= "...allowedHosts: ['*']..."), Foragent's three input parsers kept rejecting with 'Missing allowedHosts' — they only consumed text parts and expected a JSON object. RockBot 0.9.11+ supports structured input via an A2A DataPart (AgentMessagePart{Kind="data", Data=<json>}), but Foragent never advertised that it consumed data parts and the invoke_agent tool description steered the LLM to omit 'data' unless the target "is known to consume data." Result: loop. Fix spans three surfaces: 1. Parsers accept DataPart. BrowserTaskInput, LearnFormSchemaInput, and ExecuteFormBatchInput now look for a Kind="data" part first and use its Data string as the JSON source. Text-JSON fallback stays (curl callers), and for browser-task, a prose text part serves as the intent fallback when the data part doesn't supply one. Metadata overrides remain. 2. Skill descriptions explicitly direct callers to use the data parameter. Each SkillDefinition.Description now leads with "PASS INPUT AS AN A2A DATA PART (a structured JSON object), not as prose inside the text message. When calling via RockBot's invoke_agent, populate the 'data' parameter with this object." Matching entries in deploy/rockbot-seed/well-known-agents.json updated so the LLM sees the same guidance through list_known_agents. 3. Tests. Four new unit tests: one per input parser verifying a DataPart with JSON is consumed; one for browser-task's text-as- intent fallback when the data part omits intent. TestContext gained RequestWithData(...) to build the dual-part shape RockBot's invoke_agent produces. Image bumped to rockylhotka/rockbot-agent:0.9.14 — softens the invoke_agent 'data' tool description upstream, complementing the skill-description hints on the Foragent side. CLAUDE.md Status paragraph updated. Docs: CLAUDE.md Capabilities section gains a note on the DataPart contract. framework-feedback step-9 follow-up section extended with the three-surface lesson (sender tool description ↔ target skill description ↔ target parser all need to agree on the canonical shape). Tests: 52 passed / 3 LLM-gated skipped. Build clean. Curl smoke (text-JSON path) returns valid JSON via browser-task unchanged. Live Blazor end-to-end test is next, against the updated 0.9.14 rockbot image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RockBot 0.9.15 now publishes agent.task.cancel.{agentName} messages when a wisp's local state fails after dispatching an A2A task (the duplicate- dispatch scenario observed during step-9 validation). Foragent's previous behavior inherited the framework's default AgentTaskCancelHandler, which always replies TaskNotCancelable because it assumes stateless agents. Foragent is stateful — a browser task is potentially minutes long — so leaving the default would orphan browser runs. Implementation: - InFlightTaskRegistry (singleton): ConcurrentDictionary<taskId, CancellationTokenSource> with Register/TryCancel/Remove. Register returns a linked CT that fires on either external cancel or the parent message CT. Redelivered task ids cancel the prior registration before replacing it, so stale work unwinds. - ForagentTaskHandler wraps the capability's AgentTaskContext so the CT observed via context.MessageContext.CancellationToken is the linked one from the registry. Capabilities observe cancellation without any signature change. - ForagentCancelHandler (IMessageHandler<AgentTaskCancelRequest>): on match calls TryCancel and publishes nothing (the running task's own terminal reply is the acknowledgment); on miss publishes AgentTaskError{Code=TaskNotFound}. Registered via agent.HandleMessage<AgentTaskCancelRequest, ForagentCancelHandler>() after AddA2A — last AddScoped wins, overriding the default. - 11 new unit tests across registry, cancel handler, and task-handler integration (parent-cancel → linked CT fires, external cancel → linked CT fires, register/remove ties to finally, Remove drops registration even on thrown capability). Also in this commit, incorporating earlier step-9 follow-ups for the same RockBot 0.9.x interop round: - Self-teaching errors. When a parser rejects for a missing required field, the response now tells the LLM exactly how to fix the call: "Pass inputs as a JSON object on the A2A DataPart — in RockBot's invoke_agent tool, that means filling the 'data' parameter, NOT adding fields to the 'message' text. Example data: {...}." Observed behavior: LLMs that ignore skill descriptions do read error replies and adjust subsequent calls. Applied to all three parsers. - Docker image bumped to rockylhotka/rockbot-agent:0.9.15 — brings (a) invoke_agent's structured 'data' parameter (0.9.11), (b) softened tool description encouraging DataPart usage (0.9.14), and (c) the cancel-publisher that this commit consumes (0.9.15). CLAUDE.md Status section updated accordingly. Framework-feedback step-9 follow-up section extended with the cancel- handler-override pattern as a candidate for upstream WithTaskCancellation ergonomics (non-blocking — ~50 LOC across consumers isn't unbearable). Tests: 63 passed / 3 LLM-gated skipped. Build clean. Foragent starts cleanly on fresh volumes; agent.task.cancel.Foragent subscription verified active at boot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rockfordlhotka and others added 4 commits April 22, 2026 23:57

rockfordlhotka merged commit 5892713 into main Apr 23, 2026
1 check passed

rockfordlhotka deleted the step-9-deprecate-specialists branch April 23, 2026 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 9: deprecate subsumed specialists (v0.2 complete)#14

Step 9: deprecate subsumed specialists (v0.2 complete)#14
rockfordlhotka merged 4 commits intomainfrom
step-9-deprecate-specialists

rockfordlhotka commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockfordlhotka commented Apr 23, 2026

Summary

Test plan

Known limitations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant