Skip to content

Step 9: deprecate subsumed specialists (v0.2 complete)#14

Merged
rockfordlhotka merged 4 commits intomainfrom
step-9-deprecate-specialists
Apr 23, 2026
Merged

Step 9: deprecate subsumed specialists (v0.2 complete)#14
rockfordlhotka merged 4 commits intomainfrom
step-9-deprecate-specialists

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

Summary

  • Remove fetch-page-title and extract-structured-data from the advertised capability set. Both are reachable via browser-task — page-title as a trivial intent, structured extraction via a "return JSON: {…}" instruction carried in the planner's done(result=…) channel. Cost delta is ~2–3× tokens per call, acceptable given zero deterministic high-volume callers today. extract-structured-data was also out of spec on §7.1 — it called the no-argument CreateSessionAsync overload and accepted any host; the generalist enforces allowlists by design.
  • Land on the minimum v0.2 surface: three skillsbrowser-task, learn-form-schema, execute-form-batch (spec §5.2 updated, §9.1 step 9 marked shipped).
  • Prune orphaned Browser surface. IBrowserSession.FetchPageTitleAsync / CapturePageSnapshotAsync / PageSnapshot / PageSnapshotSource had no remaining callers; deleted. CapabilityInput.Parse shared URL/description shim had no remaining callers; deleted (BrowserTaskInput and Forms/*Input.cs handle their own shapes). Trim StubBrowserSessionFactory + FakeAgentBrowserSession to match. Version bumped 0.2.0-alpha.80.2.0-alpha.9.
  • Framework-feedback step-9 section captures the observation that capability-surface evolution was painless on the current IAgentTaskHandler + DI-resolved capabilities shape — confirming foragent#5 / rockbot#283 (per-skill handler registration) is quality-of-life, not a blocker.

Test plan

  • dotnet build --configuration Release — clean, 0 warnings, 0 errors
  • dotnet test --configuration Release — 48 passed / 3 LLM-gated skipped (46 Agent unit tests + 1 FormCapabilitiesIntegrationTests + 1 Foragent.Integration.Tests placeholder; skipped are the 3 BrowserTaskIntegrationTests that require FORAGENT_LLM_*)
  • agent-card skill list verified in deploy/rockbot-seed/well-known-agents.json + ForagentCapabilities.Skills — both now list the three v0.2 skills
  • docker-compose + curl smoke against the new browser-task + example.com smoke example in the comment block (operator run — not gated)
  • real-LLM check that browser-task emits valid JSON in done.result when the intent asks for it (operator run — deferred, would invalidate the decision if it fails)

Known limitations

🤖 Generated with Claude Code

rockfordlhotka and others added 4 commits April 22, 2026 23:57
Remove fetch-page-title and extract-structured-data from the advertised
skill set. Both are reachable via browser-task — page-title as a trivial
intent, structured extraction via a "return JSON: {...}" instruction
carried in the planner's done(result=...) channel. Cost delta is 2-3x
tokens per call, acceptable given zero deterministic high-volume
callers today. extract-structured-data was also out of spec on §7.1 —
it called the no-argument CreateSessionAsync overload and accepted any
host. The generalist enforces allowlists by design.

Advertised v0.2 surface lands at three skills: browser-task,
learn-form-schema, execute-form-batch.

- Delete FetchPageTitleCapability, ExtractStructuredDataCapability,
  and the shared CapabilityInput URL/description parser (no other
  consumers). browser-task has its own BrowserTaskInput; form
  capabilities have their own input classes.
- Delete the session-level one-shot helpers that only the removed
  specialists used: IBrowserSession.FetchPageTitleAsync,
  CapturePageSnapshotAsync, PageSnapshot, PageSnapshotSource.
- Delete the corresponding tests — 7 unit tests for the capabilities,
  the PlaywrightBrowserSessionTests + PageSnapshotTests integration
  suites, and the ExtractStructuredDataIntegrationTests real-LLM
  benchmark. BrowserTaskIntegrationTests remains the real-LLM surface.
- Trim StubBrowserSessionFactory + FakeAgentBrowserSession to match
  the pruned IBrowserSession.

Update metadata: deploy/rockbot-seed/*.json, docker-compose.yml
description + curl smoke example, .env.example comments, Program.cs
comment, docs/capabilities.md, spec §5.2 capability table and §9.1
step-9 description, CLAUDE.md Status + Browser + Capabilities
sections, framework-feedback step-9 section. Version bumped
0.2.0-alpha.8 → 0.2.0-alpha.9.

Tests: 48 passed (46 Agent unit + 1 Forms integration + 1 placeholder),
3 real-LLM BrowserTaskIntegrationTests skipped as expected. Build
clean on Release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Running RockBot → Foragent with a real browser-task (MacBook-price search
across apple.com + bestbuy.com) surfaced three pre-existing issues that
blocked step 9's claimed end-to-end validation. Fixing them here on the
step-9 branch keeps the PR's test plan honest.

1. BrowserTaskPriming required IEmbeddingGenerator (DI resolution bug)
   The primary-constructor parameter was annotated nullable
   (IEmbeddingGenerator<string, Embedding<float>>?), but MSDI ignores C#
   nullable annotations — it only honors default parameter values.
   Reordered to put embeddingGenerator last with = null so MSDI treats
   it as optional. Spec §5.6 says missing embeddings should downgrade
   to BM25-only retrieval; that claim is now actually true. Two test
   callers updated to drop the explicit embeddingGenerator: null arg.

2. Skill names with dotted hosts failed silently
   RockBot 0.9's FileSkillStore.ValidateName rejects '.' — every real
   host (bsky.app, apple.com, example.com) threw ArgumentException on
   save. BskySeedSkillService swallowed the throw as a startup warning,
   TryWriteLearnedSkillAsync swallowed it on the error path, and form
   schemas just never persisted. Added SkillNaming.SanitizeHost that
   replaces '.' → '-' (bsky.app → bsky-app) and applied it at three
   call sites: BskySeedSkillService, BrowserTaskCapability.
   TryWriteLearnedSkillAsync, LearnFormSchemaCapability.DeriveSkillName.
   Allowlist matching and memory-search categories keep the original
   dotted host — only skill names need sanitization. Test assertions
   (BrowserTaskCapabilityTests, BskySeedSkillServiceTests,
   LearnFormSchemaCapabilityTests) updated to the sanitized names;
   skill-optimize.md directive examples updated so the dream loop
   produces valid names.

3. Fresh named volume masks Dockerfile chown
   The Foragent Dockerfile chowns /data to the non-root foragent user
   (uid 1655) at image-build time, but Docker mounts a fresh named
   volume root-owned, masking the build-time chown. Added a
   foragent-init busybox one-shot (mirroring rockbot-init) that
   chmod -R 777 /data/foragent on volume creation.

Docs updated: CLAUDE.md Status + Learning-substrate sections,
docs/capabilities.md, spec §5.6 skill-naming paragraph (calls out the
sanitization rule), framework-feedback step-9 follow-up section with
three framework observations (MSDI nullable footgun, validator's dot
rejection making real hosts fail, named-volume permissions pattern).

Tests: 48 passed / 3 LLM-gated skipped. End-to-end smoke: RockBot
dispatches browser-task to Foragent over the bus; Foragent plans 2
steps (navigate + snapshot), emits done with JSON result, reply lands
on user.response.RockBot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When RockBot's LLM called invoke_agent with free-form prose (message=
"...allowedHosts: ['*']..."), Foragent's three input parsers kept
rejecting with 'Missing allowedHosts' — they only consumed text parts
and expected a JSON object. RockBot 0.9.11+ supports structured input
via an A2A DataPart (AgentMessagePart{Kind="data", Data=<json>}), but
Foragent never advertised that it consumed data parts and the invoke_agent
tool description steered the LLM to omit 'data' unless the target
"is known to consume data." Result: loop.

Fix spans three surfaces:

1. Parsers accept DataPart. BrowserTaskInput, LearnFormSchemaInput, and
   ExecuteFormBatchInput now look for a Kind="data" part first and use
   its Data string as the JSON source. Text-JSON fallback stays (curl
   callers), and for browser-task, a prose text part serves as the
   intent fallback when the data part doesn't supply one. Metadata
   overrides remain.

2. Skill descriptions explicitly direct callers to use the data
   parameter. Each SkillDefinition.Description now leads with "PASS
   INPUT AS AN A2A DATA PART (a structured JSON object), not as prose
   inside the text message. When calling via RockBot's invoke_agent,
   populate the 'data' parameter with this object." Matching entries
   in deploy/rockbot-seed/well-known-agents.json updated so the LLM
   sees the same guidance through list_known_agents.

3. Tests. Four new unit tests: one per input parser verifying a
   DataPart with JSON is consumed; one for browser-task's text-as-
   intent fallback when the data part omits intent. TestContext
   gained RequestWithData(...) to build the dual-part shape RockBot's
   invoke_agent produces.

Image bumped to rockylhotka/rockbot-agent:0.9.14 — softens the
invoke_agent 'data' tool description upstream, complementing the
skill-description hints on the Foragent side. CLAUDE.md Status
paragraph updated.

Docs: CLAUDE.md Capabilities section gains a note on the DataPart
contract. framework-feedback step-9 follow-up section extended with
the three-surface lesson (sender tool description ↔ target skill
description ↔ target parser all need to agree on the canonical shape).

Tests: 52 passed / 3 LLM-gated skipped. Build clean. Curl smoke
(text-JSON path) returns valid JSON via browser-task unchanged.
Live Blazor end-to-end test is next, against the updated 0.9.14
rockbot image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RockBot 0.9.15 now publishes agent.task.cancel.{agentName} messages when
a wisp's local state fails after dispatching an A2A task (the duplicate-
dispatch scenario observed during step-9 validation). Foragent's
previous behavior inherited the framework's default
AgentTaskCancelHandler, which always replies TaskNotCancelable because
it assumes stateless agents. Foragent is stateful — a browser task is
potentially minutes long — so leaving the default would orphan browser
runs.

Implementation:
- InFlightTaskRegistry (singleton): ConcurrentDictionary<taskId,
  CancellationTokenSource> with Register/TryCancel/Remove. Register
  returns a linked CT that fires on either external cancel or the
  parent message CT. Redelivered task ids cancel the prior
  registration before replacing it, so stale work unwinds.
- ForagentTaskHandler wraps the capability's AgentTaskContext so the
  CT observed via context.MessageContext.CancellationToken is the
  linked one from the registry. Capabilities observe cancellation
  without any signature change.
- ForagentCancelHandler (IMessageHandler<AgentTaskCancelRequest>):
  on match calls TryCancel and publishes nothing (the running task's
  own terminal reply is the acknowledgment); on miss publishes
  AgentTaskError{Code=TaskNotFound}. Registered via
  agent.HandleMessage<AgentTaskCancelRequest, ForagentCancelHandler>()
  after AddA2A — last AddScoped wins, overriding the default.
- 11 new unit tests across registry, cancel handler, and task-handler
  integration (parent-cancel → linked CT fires, external cancel →
  linked CT fires, register/remove ties to finally, Remove drops
  registration even on thrown capability).

Also in this commit, incorporating earlier step-9 follow-ups for the
same RockBot 0.9.x interop round:

- Self-teaching errors. When a parser rejects for a missing required
  field, the response now tells the LLM exactly how to fix the call:
  "Pass inputs as a JSON object on the A2A DataPart — in RockBot's
  invoke_agent tool, that means filling the 'data' parameter, NOT
  adding fields to the 'message' text. Example data: {...}." Observed
  behavior: LLMs that ignore skill descriptions do read error replies
  and adjust subsequent calls. Applied to all three parsers.

- Docker image bumped to rockylhotka/rockbot-agent:0.9.15 — brings
  (a) invoke_agent's structured 'data' parameter (0.9.11), (b)
  softened tool description encouraging DataPart usage (0.9.14), and
  (c) the cancel-publisher that this commit consumes (0.9.15).
  CLAUDE.md Status section updated accordingly.

Framework-feedback step-9 follow-up section extended with the cancel-
handler-override pattern as a candidate for upstream WithTaskCancellation
ergonomics (non-blocking — ~50 LOC across consumers isn't unbearable).

Tests: 63 passed / 3 LLM-gated skipped. Build clean. Foragent starts
cleanly on fresh volumes; agent.task.cancel.Foragent subscription
verified active at boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 5892713 into main Apr 23, 2026
1 check passed
@rockfordlhotka rockfordlhotka deleted the step-9-deprecate-specialists branch April 23, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant