Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions CLAUDE.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion Directory.Build.props
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<ImplicitUsings>enable</ImplicitUsings>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<EnforceCodeStyleInBuild>true</EnforceCodeStyleInBuild>
<Version>0.1.0-alpha</Version>
<Version>0.2.0-alpha.8</Version>
<Authors>Marimer LLC</Authors>
<Company>Marimer LLC</Company>
<Copyright>Copyright (c) Marimer LLC</Copyright>
Expand Down
17 changes: 9 additions & 8 deletions Directory.Packages.props
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,15 @@
<PackageVersion Include="OpenAI" Version="2.*" />
<PackageVersion Include="Microsoft.Extensions.Hosting" Version="10.0.*" />
<PackageVersion Include="A2A" Version="1.0.0-preview" />
<PackageVersion Include="RockBot.A2A" Version="0.8.*" />
<PackageVersion Include="RockBot.A2A.Abstractions" Version="0.8.*" />
<PackageVersion Include="RockBot.A2A.Gateway" Version="0.8.*" />
<PackageVersion Include="RockBot.Host" Version="0.8.*" />
<PackageVersion Include="RockBot.Host.Abstractions" Version="0.8.*" />
<PackageVersion Include="RockBot.Llm" Version="0.8.*" />
<PackageVersion Include="RockBot.Messaging.Abstractions" Version="0.8.*" />
<PackageVersion Include="RockBot.Messaging.RabbitMQ" Version="0.8.*" />
<PackageVersion Include="RockBot.A2A" Version="0.9.*" />
<PackageVersion Include="RockBot.A2A.Abstractions" Version="0.9.*" />
<PackageVersion Include="RockBot.A2A.Gateway" Version="0.9.*" />
<PackageVersion Include="RockBot.Host" Version="0.9.*" />
<PackageVersion Include="RockBot.Host.Abstractions" Version="0.9.*" />
<PackageVersion Include="RockBot.Llm" Version="0.9.*" />
<PackageVersion Include="RockBot.Llm.Abstractions" Version="0.9.*" />
<PackageVersion Include="RockBot.Messaging.Abstractions" Version="0.9.*" />
<PackageVersion Include="RockBot.Messaging.RabbitMQ" Version="0.9.*" />
<PackageVersion Include="xunit" Version="2.9.3" />
<PackageVersion Include="Xunit.SkippableFact" Version="1.5.23" />
<PackageVersion Include="xunit.runner.visualstudio" Version="2.8.2" />
Expand Down
13 changes: 8 additions & 5 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@
# - foragent — this project; exposes HTTP A2A on port 5210
# - rockbot-init — seeds /data/agent with RockBot profile + well-known-agents.json
# pointing at foragent
# - rockbot — rockylhotka/rockbot-agent:0.8.5, configured to know Foragent
# as an A2A peer it can delegate tasks to
# - rockbot — rockylhotka/rockbot-agent:0.9.11, configured to know Foragent
# as an A2A peer it can delegate tasks to. 0.9.11 brings
# the structured-data invoke_agent surface (PR #291) so
# RockBot can consume Foragent's FormSchema JSON results
# natively, not as text.
# - blazor — rockylhotka/rockbot-blazor:latest, web UI for chatting with
# the rockbot agent. Open http://localhost:8080 to test.
#
Expand Down Expand Up @@ -61,7 +64,7 @@ services:
RabbitMq__VirtualHost: /
Gateway__AgentName: Foragent
Gateway__InternalAgentName: Foragent
Gateway__Description: "Browser agent — browser-task (generalist), fetch-page-title, extract-structured-data"
Gateway__Description: "Browser agent — browser-task (generalist), learn-form-schema, execute-form-batch, fetch-page-title, extract-structured-data"
# RockBot will call Foragent with header X-Api-Key: rockbot-calls-foragent
ApiKeys__rockbot-calls-foragent__AgentId: RockBot
ApiKeys__rockbot-calls-foragent__DisplayName: RockBot
Expand Down Expand Up @@ -97,7 +100,7 @@ services:
- foragent-data:/data/foragent

rockbot-init:
image: rockylhotka/rockbot-agent:0.8.5
image: rockylhotka/rockbot-agent:0.9.11
user: root
entrypoint: ["/bin/sh", "-c"]
command:
Expand Down Expand Up @@ -133,7 +136,7 @@ services:
- ./deploy/rockbot-seed:/seed:ro

rockbot:
image: rockylhotka/rockbot-agent:0.8.5
image: rockylhotka/rockbot-agent:0.9.11
depends_on:
rockbot-init:
condition: service_completed_successfully
Expand Down
89 changes: 89 additions & 0 deletions docs/framework-feedback.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,3 +484,92 @@ docker-compose:
First live dream pass against a non-empty skill store will be observed
after enough `browser-task` runs accumulate — probably step 8 or when
the operator turns the harness on for a sustained session.

## Step 8 — `learn-form-schema` + `execute-form-batch`

### Framework observations

- **Multi-file skill API (RockBot 0.9) closes spec open-question #6
cleanly.** Step 8 needed typed JSON schemas alongside the existing
markdown-shaped skills; we'd sketched three options (fenced JSON in
the skill body, a parallel Foragent-local typed store, or an upstream
framework extension). The upstream extension had already landed in
`rockbot` main — `Skill.Manifest: IReadOnlyList<SkillResource>?` plus
`ISkillStore.SaveAsync(skill, resources)` and
`GetResourceAsync(skillName, filename)`. `SkillResourceType.JsonSchema`
is literally the enum value this use case needed. Foragent consumed it
directly: `LearnFormSchemaCapability` writes the skill bundle,
`ExecuteFormBatchCapability` reads `schema.json` back, no parallel
store. The "framework is the substrate" discipline from spec §8
actually paid off here — we'd have thrown away a Foragent-local store
one step later when the framework landed this.

- **`SaveAsync(skill)` preserving the manifest is the important bit.**
Per commit 2db3775 fix #1, a plain
`ISkillStore.SaveAsync(skill)` call preserves the existing
`Manifest` when the incoming skill doesn't carry one. That means the
daily dream loop's `skill-optimize` subtype (which rewrites
markdown content) can't accidentally orphan resource files, and
Foragent's `learn-form-schema` can update a skill's prose primer
without re-writing the schema resource. Without this property, the
dream loop would silently delete Foragent's typed schemas over time.
Worth documenting prominently in RockBot's multi-file-skill guide —
future framework consumers will trip on "my resources disappeared"
otherwise.

- **`AgentTaskContext.PublishStatus` works unchanged for per-row
streaming.** Step 8's `execute-form-batch` publishes
`AgentTaskStatusUpdate { State = Working, Message = …per-row text… }`
between row submissions. The surface from 0.8.x is still right for
this, and matches how `RockBot.ResearchAgent` uses it for its
iterative research loop. Nothing to change; noting so the next
step-N capability that wants streaming knows the shape is stable.

- **Credential broker still doesn't know about storage-state or
per-tenant scoping.** `learn-form-schema` and `execute-form-batch`
both accept `credentialId` but only resolve-and-discard it for
audit / fail-fast; the actual authenticated-form flow (storage-state
reuse from a prior `browser-task` login) is still the step-4 deferred
item. Not new — noting because step 8's capabilities would naturally
use this if it existed. When storage-state lands, both form
capabilities grow `storageStateCredentialId` support in one pass.

- **Foragent-local `FakeSkillStore` still a 40-line hand-rolled
double.** Step 8 adds more surface area (`SaveAsync(skill, resources)`
+ `GetResourceAsync`) and the fake needs to match `FileSkillStore`'s
manifest-preservation behavior to be a faithful substitute. Still
noting from step 7: a `RockBot.Host.Testing` package with in-memory
implementations would let Foragent delete both `FakeSkillStore` in
`Foragent.Agent.Tests` and `InMemorySkillStore` in
`Foragent.Browser.Tests`, and would surface any future-proof gaps
in the fakes once a framework change lands.

### Spec resolutions

- **Open-question #6 (structured artifacts in `ISkillStore`):
resolved upstream.** No Foragent-local typed store. `schema.json`
resources under `SkillResourceType.JsonSchema`, consumed via
`GetResourceAsync`. Step 8 ships the reference pattern.
- **Open-question #8 (batch retry/failure semantics): resolved as
abort-on-first default, caller-opt-in continue.** Rationale: forms
mutate, so a row failure is likely a schema or session issue where
continuing would generate more bad submissions, not recover. Per-row
status stays in the final result regardless. Deciding abort-by-default
aligns with how human operators would handle a failed row during a
paper-form batch.

### Verification

- Unit tests in `Foragent.Agent.Tests/Forms/` — 14 tests covering input
validation, schema round-trip through `SkillResource`, abort-on-first
vs continue semantics, `successIndicator` path, and required-field
validation. Run time ~3s.
- Integration test `Foragent.Browser.Tests/FormCapabilitiesIntegrationTests`
— spins up Kestrel with a real HTML form, drives
`learn-form-schema` + `execute-form-batch` end-to-end against real
Chromium, verifies two rows actually land in the server's POST
handler. Not LLM-gated — the enricher short-circuits on forms
without select/radio, so this runs in CI without
`FORAGENT_LLM_*`.
- Existing step-6 benchmark still 3/3 — framework bump didn't regress
anything else.
76 changes: 76 additions & 0 deletions src/Foragent.Browser/IBrowserSession.cs
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,20 @@ public interface IBrowserPage : IAsyncDisposable
/// <summary>Clicks the element matched by <paramref name="selector"/>.</summary>
Task ClickAsync(string selector, CancellationToken cancellationToken = default);

/// <summary>
/// Selects the option with <paramref name="value"/> in the <c>&lt;select&gt;</c>
/// matched by <paramref name="selector"/>. Throws if the option is absent.
/// </summary>
Task SelectOptionAsync(string selector, string value, CancellationToken cancellationToken = default);

/// <summary>
/// Sets the checked state of a checkbox or radio input matched by
/// <paramref name="selector"/>. Unlike <see cref="ClickAsync"/>, this is
/// idempotent — calling with <c>true</c> when the box is already checked
/// is a no-op rather than a toggle.
/// </summary>
Task SetCheckedAsync(string selector, bool checked_, CancellationToken cancellationToken = default);

/// <summary>
/// Waits until the element matched by <paramref name="selector"/> is attached
/// and visible. Throws <see cref="TimeoutException"/> on timeout.
Expand All @@ -82,6 +96,18 @@ Task WaitForSelectorAsync(
/// messages or confirmation text.
/// </summary>
Task<string?> GetTextAsync(string selector, CancellationToken cancellationToken = default);

/// <summary>
/// Scans the first <c>&lt;form&gt;</c> matching <paramref name="formSelector"/>
/// (or the first form on the page when <paramref name="formSelector"/> is
/// <c>null</c>) and returns a structured description of its inputs, selects,
/// textareas, labels, validation attributes, and submit button. Produces no
/// LLM output — purely deterministic DOM reading — so callers can use it as
/// the skeleton for a typed <c>FormSchema</c>. Returns <c>null</c> when no
/// form is found. Radio groups are collapsed to a single field per name
/// with all options enumerated.
/// </summary>
Task<FormScan?> ScanFormAsync(string? formSelector = null, CancellationToken cancellationToken = default);
}

/// <summary>
Expand Down Expand Up @@ -134,6 +160,56 @@ Task WaitForRefAsync(
CancellationToken cancellationToken = default);
}

/// <summary>
/// A deterministic rendering of an HTML form. Produced by
/// <see cref="IBrowserPage.ScanFormAsync"/>; the <c>learn-form-schema</c>
/// capability lifts this into the wire-level <c>FormSchema</c> with optional
/// LLM enrichment (dropdown dependencies, validation hints).
/// </summary>
/// <param name="Url">The URL the scan was taken on (after redirects).</param>
/// <param name="FormSelector">A CSS selector that reaches the scanned form — either the one the caller passed in, or a generated one based on the form's id/name.</param>
/// <param name="SubmitSelector">Selector for the form's submit control, or <c>null</c> if none was detected.</param>
/// <param name="Fields">Fields detected in document order. Radio groups appear once per group name.</param>
public sealed record FormScan(
Uri Url,
string FormSelector,
string? SubmitSelector,
IReadOnlyList<FormScanField> Fields);

/// <summary>
/// One field detected by <see cref="IBrowserPage.ScanFormAsync"/>. Carries raw
/// HTML attributes — the capability layer decides how to map <see cref="Tag"/>
/// + <see cref="InputType"/> to its typed <c>FormFieldType</c>.
/// </summary>
/// <param name="Tag">The element tag — <c>input</c>, <c>select</c>, or <c>textarea</c>.</param>
/// <param name="InputType">The <c>type</c> attribute for <c>&lt;input&gt;</c> elements (<c>text</c>, <c>email</c>, …); <c>null</c> for non-input tags.</param>
/// <param name="Name">The <c>name</c> attribute, or <c>null</c>.</param>
/// <param name="Id">The <c>id</c> attribute, or <c>null</c>.</param>
/// <param name="Label">Visible label text resolved via <c>label[for=id]</c>, a wrapping <c>&lt;label&gt;</c>, <c>aria-label</c>, or the placeholder.</param>
/// <param name="Required">Whether the element carries the <c>required</c> attribute.</param>
/// <param name="Pattern">The HTML5 <c>pattern</c> attribute, or <c>null</c>.</param>
/// <param name="Min">The HTML5 <c>min</c> attribute, or <c>null</c>.</param>
/// <param name="Max">The HTML5 <c>max</c> attribute, or <c>null</c>.</param>
/// <param name="MaxLength">The HTML5 <c>maxlength</c> attribute, or <c>null</c> when unspecified or non-positive.</param>
/// <param name="Options">Enumerated options for <c>select</c> and radio groups; <c>null</c> for free-text fields.</param>
/// <param name="Selector">A CSS selector the capability can use to drive the field; <c>null</c> when neither name nor id is present.</param>
public sealed record FormScanField(
string Tag,
string? InputType,
string? Name,
string? Id,
string? Label,
bool Required,
string? Pattern,
string? Min,
string? Max,
int? MaxLength,
IReadOnlyList<FormScanOption>? Options,
string? Selector);

/// <summary>An option entry for a <c>&lt;select&gt;</c> or radio group.</summary>
public sealed record FormScanOption(string Value, string? Label);

/// <summary>
/// A compact rendering of a page suitable for LLM prompting.
/// </summary>
Expand Down
Loading
Loading