Step 9: deprecate subsumed specialists (v0.2 complete)#14
Merged
rockfordlhotka merged 4 commits intomainfrom Apr 23, 2026
Merged
Step 9: deprecate subsumed specialists (v0.2 complete)#14rockfordlhotka merged 4 commits intomainfrom
rockfordlhotka merged 4 commits intomainfrom
Conversation
Remove fetch-page-title and extract-structured-data from the advertised
skill set. Both are reachable via browser-task — page-title as a trivial
intent, structured extraction via a "return JSON: {...}" instruction
carried in the planner's done(result=...) channel. Cost delta is 2-3x
tokens per call, acceptable given zero deterministic high-volume
callers today. extract-structured-data was also out of spec on §7.1 —
it called the no-argument CreateSessionAsync overload and accepted any
host. The generalist enforces allowlists by design.
Advertised v0.2 surface lands at three skills: browser-task,
learn-form-schema, execute-form-batch.
- Delete FetchPageTitleCapability, ExtractStructuredDataCapability,
and the shared CapabilityInput URL/description parser (no other
consumers). browser-task has its own BrowserTaskInput; form
capabilities have their own input classes.
- Delete the session-level one-shot helpers that only the removed
specialists used: IBrowserSession.FetchPageTitleAsync,
CapturePageSnapshotAsync, PageSnapshot, PageSnapshotSource.
- Delete the corresponding tests — 7 unit tests for the capabilities,
the PlaywrightBrowserSessionTests + PageSnapshotTests integration
suites, and the ExtractStructuredDataIntegrationTests real-LLM
benchmark. BrowserTaskIntegrationTests remains the real-LLM surface.
- Trim StubBrowserSessionFactory + FakeAgentBrowserSession to match
the pruned IBrowserSession.
Update metadata: deploy/rockbot-seed/*.json, docker-compose.yml
description + curl smoke example, .env.example comments, Program.cs
comment, docs/capabilities.md, spec §5.2 capability table and §9.1
step-9 description, CLAUDE.md Status + Browser + Capabilities
sections, framework-feedback step-9 section. Version bumped
0.2.0-alpha.8 → 0.2.0-alpha.9.
Tests: 48 passed (46 Agent unit + 1 Forms integration + 1 placeholder),
3 real-LLM BrowserTaskIntegrationTests skipped as expected. Build
clean on Release.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Running RockBot → Foragent with a real browser-task (MacBook-price search across apple.com + bestbuy.com) surfaced three pre-existing issues that blocked step 9's claimed end-to-end validation. Fixing them here on the step-9 branch keeps the PR's test plan honest. 1. BrowserTaskPriming required IEmbeddingGenerator (DI resolution bug) The primary-constructor parameter was annotated nullable (IEmbeddingGenerator<string, Embedding<float>>?), but MSDI ignores C# nullable annotations — it only honors default parameter values. Reordered to put embeddingGenerator last with = null so MSDI treats it as optional. Spec §5.6 says missing embeddings should downgrade to BM25-only retrieval; that claim is now actually true. Two test callers updated to drop the explicit embeddingGenerator: null arg. 2. Skill names with dotted hosts failed silently RockBot 0.9's FileSkillStore.ValidateName rejects '.' — every real host (bsky.app, apple.com, example.com) threw ArgumentException on save. BskySeedSkillService swallowed the throw as a startup warning, TryWriteLearnedSkillAsync swallowed it on the error path, and form schemas just never persisted. Added SkillNaming.SanitizeHost that replaces '.' → '-' (bsky.app → bsky-app) and applied it at three call sites: BskySeedSkillService, BrowserTaskCapability. TryWriteLearnedSkillAsync, LearnFormSchemaCapability.DeriveSkillName. Allowlist matching and memory-search categories keep the original dotted host — only skill names need sanitization. Test assertions (BrowserTaskCapabilityTests, BskySeedSkillServiceTests, LearnFormSchemaCapabilityTests) updated to the sanitized names; skill-optimize.md directive examples updated so the dream loop produces valid names. 3. Fresh named volume masks Dockerfile chown The Foragent Dockerfile chowns /data to the non-root foragent user (uid 1655) at image-build time, but Docker mounts a fresh named volume root-owned, masking the build-time chown. Added a foragent-init busybox one-shot (mirroring rockbot-init) that chmod -R 777 /data/foragent on volume creation. Docs updated: CLAUDE.md Status + Learning-substrate sections, docs/capabilities.md, spec §5.6 skill-naming paragraph (calls out the sanitization rule), framework-feedback step-9 follow-up section with three framework observations (MSDI nullable footgun, validator's dot rejection making real hosts fail, named-volume permissions pattern). Tests: 48 passed / 3 LLM-gated skipped. End-to-end smoke: RockBot dispatches browser-task to Foragent over the bus; Foragent plans 2 steps (navigate + snapshot), emits done with JSON result, reply lands on user.response.RockBot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When RockBot's LLM called invoke_agent with free-form prose (message=
"...allowedHosts: ['*']..."), Foragent's three input parsers kept
rejecting with 'Missing allowedHosts' — they only consumed text parts
and expected a JSON object. RockBot 0.9.11+ supports structured input
via an A2A DataPart (AgentMessagePart{Kind="data", Data=<json>}), but
Foragent never advertised that it consumed data parts and the invoke_agent
tool description steered the LLM to omit 'data' unless the target
"is known to consume data." Result: loop.
Fix spans three surfaces:
1. Parsers accept DataPart. BrowserTaskInput, LearnFormSchemaInput, and
ExecuteFormBatchInput now look for a Kind="data" part first and use
its Data string as the JSON source. Text-JSON fallback stays (curl
callers), and for browser-task, a prose text part serves as the
intent fallback when the data part doesn't supply one. Metadata
overrides remain.
2. Skill descriptions explicitly direct callers to use the data
parameter. Each SkillDefinition.Description now leads with "PASS
INPUT AS AN A2A DATA PART (a structured JSON object), not as prose
inside the text message. When calling via RockBot's invoke_agent,
populate the 'data' parameter with this object." Matching entries
in deploy/rockbot-seed/well-known-agents.json updated so the LLM
sees the same guidance through list_known_agents.
3. Tests. Four new unit tests: one per input parser verifying a
DataPart with JSON is consumed; one for browser-task's text-as-
intent fallback when the data part omits intent. TestContext
gained RequestWithData(...) to build the dual-part shape RockBot's
invoke_agent produces.
Image bumped to rockylhotka/rockbot-agent:0.9.14 — softens the
invoke_agent 'data' tool description upstream, complementing the
skill-description hints on the Foragent side. CLAUDE.md Status
paragraph updated.
Docs: CLAUDE.md Capabilities section gains a note on the DataPart
contract. framework-feedback step-9 follow-up section extended with
the three-surface lesson (sender tool description ↔ target skill
description ↔ target parser all need to agree on the canonical shape).
Tests: 52 passed / 3 LLM-gated skipped. Build clean. Curl smoke
(text-JSON path) returns valid JSON via browser-task unchanged.
Live Blazor end-to-end test is next, against the updated 0.9.14
rockbot image.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RockBot 0.9.15 now publishes agent.task.cancel.{agentName} messages when
a wisp's local state fails after dispatching an A2A task (the duplicate-
dispatch scenario observed during step-9 validation). Foragent's
previous behavior inherited the framework's default
AgentTaskCancelHandler, which always replies TaskNotCancelable because
it assumes stateless agents. Foragent is stateful — a browser task is
potentially minutes long — so leaving the default would orphan browser
runs.
Implementation:
- InFlightTaskRegistry (singleton): ConcurrentDictionary<taskId,
CancellationTokenSource> with Register/TryCancel/Remove. Register
returns a linked CT that fires on either external cancel or the
parent message CT. Redelivered task ids cancel the prior
registration before replacing it, so stale work unwinds.
- ForagentTaskHandler wraps the capability's AgentTaskContext so the
CT observed via context.MessageContext.CancellationToken is the
linked one from the registry. Capabilities observe cancellation
without any signature change.
- ForagentCancelHandler (IMessageHandler<AgentTaskCancelRequest>):
on match calls TryCancel and publishes nothing (the running task's
own terminal reply is the acknowledgment); on miss publishes
AgentTaskError{Code=TaskNotFound}. Registered via
agent.HandleMessage<AgentTaskCancelRequest, ForagentCancelHandler>()
after AddA2A — last AddScoped wins, overriding the default.
- 11 new unit tests across registry, cancel handler, and task-handler
integration (parent-cancel → linked CT fires, external cancel →
linked CT fires, register/remove ties to finally, Remove drops
registration even on thrown capability).
Also in this commit, incorporating earlier step-9 follow-ups for the
same RockBot 0.9.x interop round:
- Self-teaching errors. When a parser rejects for a missing required
field, the response now tells the LLM exactly how to fix the call:
"Pass inputs as a JSON object on the A2A DataPart — in RockBot's
invoke_agent tool, that means filling the 'data' parameter, NOT
adding fields to the 'message' text. Example data: {...}." Observed
behavior: LLMs that ignore skill descriptions do read error replies
and adjust subsequent calls. Applied to all three parsers.
- Docker image bumped to rockylhotka/rockbot-agent:0.9.15 — brings
(a) invoke_agent's structured 'data' parameter (0.9.11), (b)
softened tool description encouraging DataPart usage (0.9.14), and
(c) the cancel-publisher that this commit consumes (0.9.15).
CLAUDE.md Status section updated accordingly.
Framework-feedback step-9 follow-up section extended with the cancel-
handler-override pattern as a candidate for upstream WithTaskCancellation
ergonomics (non-blocking — ~50 LOC across consumers isn't unbearable).
Tests: 63 passed / 3 LLM-gated skipped. Build clean. Foragent starts
cleanly on fresh volumes; agent.task.cancel.Foragent subscription
verified active at boot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fetch-page-titleandextract-structured-datafrom the advertised capability set. Both are reachable viabrowser-task— page-title as a trivial intent, structured extraction via a "return JSON: {…}" instruction carried in the planner'sdone(result=…)channel. Cost delta is ~2–3× tokens per call, acceptable given zero deterministic high-volume callers today.extract-structured-datawas also out of spec on §7.1 — it called the no-argumentCreateSessionAsyncoverload and accepted any host; the generalist enforces allowlists by design.browser-task,learn-form-schema,execute-form-batch(spec §5.2 updated, §9.1 step 9 marked shipped).IBrowserSession.FetchPageTitleAsync/CapturePageSnapshotAsync/PageSnapshot/PageSnapshotSourcehad no remaining callers; deleted.CapabilityInput.Parseshared URL/description shim had no remaining callers; deleted (BrowserTaskInputandForms/*Input.cshandle their own shapes). TrimStubBrowserSessionFactory+FakeAgentBrowserSessionto match. Version bumped0.2.0-alpha.8→0.2.0-alpha.9.IAgentTaskHandler+ DI-resolved capabilities shape — confirming foragent#5 / rockbot#283 (per-skill handler registration) is quality-of-life, not a blocker.Test plan
dotnet build --configuration Release— clean, 0 warnings, 0 errorsdotnet test --configuration Release— 48 passed / 3 LLM-gated skipped (46 Agent unit tests + 1FormCapabilitiesIntegrationTests+ 1Foragent.Integration.Testsplaceholder; skipped are the 3BrowserTaskIntegrationTeststhat requireFORAGENT_LLM_*)deploy/rockbot-seed/well-known-agents.json+ForagentCapabilities.Skills— both now list the three v0.2 skillsbrowser-task+example.comsmoke example in the comment block (operator run — not gated)browser-taskemits valid JSON indone.resultwhen the intent asks for it (operator run — deferred, would invalidate the decision if it fails)Known limitations
browser-task'sdone.resultchannel is not schema-enforced the wayextract-structured-data'sResponseFormat = Jsonwas. Callers asking for structured extraction should include the target JSON shape verbatim in the intent. If high-volume deterministic extraction callers ever appear, resurrect a specialist with the benefit of actual usage data.🤖 Generated with Claude Code