Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Status

Foragent is at **milestone 7 shipped, step 8 next**. Three capabilities are advertised (`browser-task`, `fetch-page-title`, `extract-structured-data`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. Step 6 shipped the generalist `browser-task` planner (LLM-in-the-loop over ref-annotated aria snapshots + `aria-ref=eN` locator resolution, built on `Microsoft.Playwright` 1.59 — bumped from 1.50 for the Ai aria-snapshot mode; see Appendix A #16). Tiered chat clients are wired via `AddRockBotTieredChatClients` with one model aliased across Low/Balanced/High per spec §3.7. Step 7 wired the learning substrate: `ISkillStore` + `ILongTermMemory` via `WithSkills()` + `WithLongTermMemory()`, `BrowserTaskPriming` injects retrieved skill + memory content into the planner prompt, successful tasks write a learned skill at `sites/{host}/learned/{slug}`, and `BskySeedSkillService` seeds `sites/bsky.app/login` on first start (idempotent — only writes when absent). Embeddings are optional and configured separately under `ForagentEmbeddings` so they can live on a different Azure Foundry subscription than the chat model; missing embeddings downgrade retrieval to BM25-only with a single startup warning. The step-6 unaided benchmark (3/3) still passes after the priming wiring. `post-to-site` has been removed from both the advertised skill list and the codebase (greenfield deletion — `browser-task` + the learned bsky skill cover the use case). The governing spec is `docs/foragent-specification.md` **v0.2**. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md`. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.
Foragent is at **milestone 7.5 shipped, step 8 next**. Three capabilities are advertised (`browser-task`, `fetch-page-title`, `extract-structured-data`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. Step 6 shipped the generalist `browser-task` planner (LLM-in-the-loop over ref-annotated aria snapshots + `aria-ref=eN` locator resolution, built on `Microsoft.Playwright` 1.59 — bumped from 1.50 for the Ai aria-snapshot mode; see Appendix A #16). Tiered chat clients are wired via `AddRockBotTieredChatClients` with one model aliased across Low/Balanced/High per spec §3.7. Step 7 wired the learning substrate: `ISkillStore` + `ILongTermMemory` via `WithSkills()` + `WithLongTermMemory()`, `BrowserTaskPriming` injects retrieved skill + memory content into the planner prompt, successful tasks write a learned skill at `sites/{host}/learned/{slug}`, and `BskySeedSkillService` seeds `sites/bsky.app/login` on first start (idempotent — only writes when absent). Embeddings are optional and configured separately under `ForagentEmbeddings` so they can live on a different Azure Foundry subscription than the chat model; missing embeddings downgrade retrieval to BM25-only with a single startup warning. The step-6 unaided benchmark (3/3) still passes after the priming wiring. `post-to-site` has been removed from both the advertised skill list and the codebase (greenfield deletion — `browser-task` + the learned bsky skill cover the use case). The governing spec is `docs/foragent-specification.md` **v0.2**. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md`. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.

## Build / test

Expand Down Expand Up @@ -103,7 +103,22 @@ On successful completion (`state.IsDone`), `BrowserTaskCapability.TryWriteLearne

`BskySeedSkillService` (IHostedService) seeds `sites/bsky.app/login` on first start by calling `ISkillStore.GetAsync` and only writing if absent — docker volume recreation reseeds cleanly; operator edits to the skill through other channels are preserved.

Skill naming follows spec §5.6: `sites/{host}/{intent}` for human-authored primers, `sites/{host}/learned/{slug}` for agent-generated. `Skill.SeeAlso` cross-references related skills to surface clusters rather than single entries. **Note:** `Skill` (from `RockBot.Host 0.8.5`) does not carry tags, metadata, or importance — the `agent-learned` distinction is encoded in the name prefix only.
Skill naming follows spec §5.6: `sites/{host}/{intent}` for human-authored primers, `sites/{host}/learned/{slug}` for agent-generated. `Skill.SeeAlso` cross-references related skills to surface clusters rather than single entries. **Note:** `Skill` (from `RockBot.Host 0.8.5`) does not carry tags, metadata, or importance — the `agent-learned` distinction is encoded in the name prefix only. The dream loop (below) keeps the distinction from mattering at retrieval time: skills get improved, merged, and deduped across origins on a daily cadence.

## Dream loop (step 7.5)

Foragent runs a daily RockBot dream pass to consolidate accumulated skills and memory. Wired via `agent.AddScheduling()` + `agent.WithDreaming(opts)` inside `AddRockBotHost`. Five subtypes are enabled, eight are off:

- **Enabled:** main orchestrator (`dream.md`), skill-optimize (merge/dedup), skill-gap (detect missing coverage), sequence-skill (detect repeated tool patterns), memory-mining (promote durable observations to `ILongTermMemory`).
- **Disabled:** preference inference, episode extraction, tier-routing review, entity extraction, graph consolidation, identity reflection, DLQ review, Wisp failure analysis. All personality-agent territory.

`ProtectedSkillPrefixes = []` — empty on purpose. Operator primers like `sites/bsky.app/login` are *improved in place* by the dream, not frozen; the seed service only writes on a cold boot, so later dream-authored improvements survive restarts. Operators who need to reset a primer can delete the stored skill file and bounce the host.

Directive files live at `src/Foragent.Agent/directives/*.md` and ship with the binary via `<Content Include="directives/*.md" CopyToOutputDirectory="PreserveNewest" />`. `DreamService` resolves each `DreamOptions.*DirectivePath` relative to `AgentProfileOptions.BasePath` (confirmed by IL inspection — relative base paths combine against `AppContext.BaseDirectory`, which is the binary output dir). Program.cs configures `AgentProfileOptions.BasePath = "directives"`; no `WithProfile()` call, Foragent doesn't need the personality-profile doc set.

Dreams are **opt-in** via `ForagentDreams:Enabled`. `appsettings.json` defaults false so `dotnet run` smoke tests don't trigger scheduled LLM calls; `docker-compose.yml` sets `ForagentDreams__Enabled=true` because that's the "full operating mode" shape. `CronSchedule` defaults to `0 3 * * *` (03:00 UTC daily) — the framework default of every 12 hours is too frequent for a browser worker. `InitialDelay` is the framework default (5 minutes from start), which is fine in prod but worth noting if someone spins up the compose harness for a 10-minute smoke session.

**Don't add directive content to the RockBot agent's `deploy/rockbot-seed/` set.** Foragent's directives are task-shaped (browser outcomes, site knowledge); RockBot's are personality-shaped (identity, preferences). Mixing them defeats the reason Foragent authored its own.

## Credentials

Expand Down
6 changes: 6 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ services:
# mounted volume below so learned site knowledge survives restarts.
ForagentMemory__SkillsPath: /data/foragent/skills
ForagentMemory__MemoryPath: /data/foragent/memory
# Step 7.5: daily dream pass to consolidate accumulated skills +
# memory. Disabled by default in appsettings.json so `dotnet run` smoke
# tests don't burn tokens; opt-in here because the compose harness is
# the "full operating mode" shape. CronSchedule default is 03:00 UTC
# daily — override via ForagentDreams__CronSchedule if needed.
ForagentDreams__Enabled: "true"
# Optional Bluesky credential used by future credentialed browser-task
# runs. Flat id (no slashes) because env-var keys use __ to separate
# config segments. Leave unset to disable.
Expand Down
75 changes: 75 additions & 0 deletions docs/framework-feedback.md
Original file line number Diff line number Diff line change
Expand Up @@ -409,3 +409,78 @@ scenarios still pass on first attempt — the priming wiring itself adds
no overhead when the stores return nothing, confirming the fail-soft
contract. A separate benchmark with a populated store is step-8-or-later
work (need a curated skill set worth priming against).

## Step 7.5 — dream loop

### Framework observations

- **Dream directives don't ship with the framework.** `DreamOptions`
defaults to bare filenames (`dream.md`, `skill-optimize.md`,
`sequence-skill.md`, etc.) that `DreamService` reads at runtime. The
`RockBot.Host`/`RockBot.Host.Abstractions` assemblies carry **zero
embedded resources** — no `.md` defaults, no stub directives. The
RockBot agent ships its directive set inside its docker image
(`/app/agent/*.md`), and `docker-compose.yml`'s `rockbot-init` step
copies them to `/data/agent/`. This is intentional (per operator
guidance: the framework can't know what any given consumer needs),
but it means every new framework consumer carries a ~300-line
directive-authoring cost as a prerequisite to turning on dreams.
Candidate framework offering (not an ask, since the intentionality
is real): optional companion packages like
`RockBot.Host.Directives.Personality` and
`RockBot.Host.Directives.Task` that ship starter directive sets,
selectable by `WithDreaming(opts => opts.UsePersonalityDefaults())`
or similar. Reduces onboarding cost without compromising the
no-hardcoded-content principle.

- **Directive paths resolve via `AgentProfileOptions.BasePath`.** IL
inspection of `DreamService`'s `ResolvePath` helper confirms: for
each directive (e.g. `opts.SkillOptimizeDirectivePath =
"skill-optimize.md"`), the final path is:
`Path.Combine(basePath, directive)` where `basePath` comes from
`IOptions<AgentProfileOptions>.Value.BasePath`. If `basePath` is
relative, it combines against `AppContext.BaseDirectory` (binary
output dir). Foragent configures `AgentProfileOptions.BasePath =
"directives"` and ships markdown files alongside the binary via
`CopyToOutputDirectory=PreserveNewest` — no `WithProfile()` call
needed. Worth documenting in RockBot's dream-loop guide: consumers
that don't load a personality profile still need to Configure the
options type because that's the single source of truth for directive
base paths.

- **`DreamService`'s constructor pulls 17 dependencies.** Everything
the dream subtypes might need (`IConversationLog`, `IDlqSampler`,
`IWispExecutionLog`, `IKnowledgeGraph`, `TierRoutingLogger`, …) is a
hard ctor parameter, so the framework registers stub / no-op
implementations for the ones a given agent doesn't use. Works, but
consumers who turn off a subtype shouldn't need its stores in DI at
all. Candidate framework refactor: make the subtype dependencies
optional (`IEnumerable<IDreamSubtype>` or similar) so
`DreamService.StartAsync` enumerates whatever's registered and skips
what isn't. Lower priority than the directives ask.

- **`ProtectedSkillPrefixes` literal-only.** The list is
`List<string>` and (from the IL) matched via `StartsWith` — no
wildcard expansion. Foragent ships it empty; operators can add
specific literals if they need to freeze a skill. Noting because
wildcard-style patterns (`sites/*/login`) would be a natural
extension and aren't there today.

### Manual verification plan

Automated tests for the dream loop would require faking the scheduler
and running an end-to-end pass — out of scope. Verified manually via
docker-compose:

- Container starts with `ForagentDreams__Enabled=true` → startup log
shows `ForagentDreams enabled; daily dream pass on schedule '0 3 * *
*'`.
- Container starts with dreams disabled → log shows the opposite and
`DreamService` is not registered.
- Directive files present at `/app/directives/*.md` inside the
container (verified via `docker compose exec foragent ls
/app/directives/`).

First live dream pass against a non-empty skill store will be observed
after enough `browser-task` runs accumulate — probably step 8 or when
the operator turns the harness on for a sustained session.
7 changes: 7 additions & 0 deletions src/Foragent.Agent/Foragent.Agent.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,11 @@
<ProjectReference Include="..\Foragent.Capabilities\Foragent.Capabilities.csproj" />
<ProjectReference Include="..\Foragent.Credentials\Foragent.Credentials.csproj" />
</ItemGroup>
<ItemGroup>
<!-- Dream directives (step 7.5). DreamService resolves directive paths
relative to AgentProfileOptions.BasePath; with a relative BasePath
they combine against AppContext.BaseDirectory (the binary output
dir), so shipping the markdown alongside the binary is enough. -->
<Content Include="directives\*.md" CopyToOutputDirectory="PreserveNewest" />
</ItemGroup>
</Project>
60 changes: 60 additions & 0 deletions src/Foragent.Agent/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,21 @@
var skillsPath = memorySection["SkillsPath"] ?? "data/skills";
var memoryPath = memorySection["MemoryPath"] ?? "data/memory";

// Dream loop (step 7.5). Opt-in via ForagentDreams:Enabled — default off so
// local `dotnet run` smoke tests don't trigger scheduled LLM calls. The
// docker-compose harness sets this to true. CronSchedule defaults to 03:00
// UTC daily; framework default is every 12 hours, too frequent for a browser
// worker. Directive files ship alongside the binary under ./directives/ —
// DreamService resolves each directive path relative to
// AgentProfileOptions.BasePath (confirmed via IL inspection; relative paths
// combine against AppContext.BaseDirectory).
var dreamsSection = builder.Configuration.GetSection("ForagentDreams");
var dreamsEnabled = dreamsSection.GetValue<bool?>("Enabled") ?? false;
var directivesPath = dreamsSection["DirectivesPath"] ?? "directives";
var dreamsCron = dreamsSection["CronSchedule"] ?? "0 3 * * *";

builder.Services.Configure<AgentProfileOptions>(opts => opts.BasePath = directivesPath);

builder.Services.AddRockBotHost(agent =>
{
agent.WithIdentity(agentName);
Expand All @@ -106,6 +121,38 @@
agent.WithSkills(opts => opts.BasePath = skillsPath);
agent.WithLongTermMemory(opts => opts.BasePath = memoryPath);

if (dreamsEnabled)
{
agent.AddScheduling();
agent.WithDreaming(opts =>
{
opts.Enabled = true;
opts.CronSchedule = dreamsCron;

// Task-shaped dream subtypes (see directives/dream.md).
opts.SkillGapEnabled = true;
opts.SequenceSkillDetectionEnabled = true;
opts.MemoryMiningEnabled = true;

// Personality-shaped subtypes — not applicable to a browser
// worker. Disabling these skips both the LLM call and the
// directive-file lookup.
opts.PreferenceInferenceEnabled = false;
opts.EpisodeExtractionEnabled = false;
opts.TierRoutingReviewEnabled = false;
opts.EntityExtractionEnabled = false;
opts.GraphConsolidationEnabled = false;
opts.IdentityReflectionEnabled = false;
opts.DlqReviewEnabled = false;
opts.WispFailureAnalysisEnabled = false;

// Empty protected list — the goal is that the dream improves
// primer skills over time, not that primers are frozen
// (operator can still edit them through other channels).
opts.ProtectedSkillPrefixes = [];
});
}

agent.Services.AddForagentCapabilities();
agent.Services.AddHostedService<BskySeedSkillService>();
});
Expand Down Expand Up @@ -157,6 +204,19 @@
+ "Set ForagentEmbeddings:Endpoint/ModelId/ApiKey to enable semantic retrieval.");
}

if (dreamsEnabled)
{
app.Logger.LogInformation(
"ForagentDreams enabled; daily dream pass on schedule '{Cron}' will consolidate skills and memory.",
dreamsCron);
}
else
{
app.Logger.LogInformation(
"ForagentDreams disabled. Learned skills will accumulate without consolidation; "
+ "set ForagentDreams:Enabled=true to turn on the daily dream pass.");
}

app.Run();

public partial class Program;
57 changes: 57 additions & 0 deletions src/Foragent.Agent/directives/dream.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Foragent dream loop

You are the dream pass for **Foragent**, a task-level browser-automation
agent built on the RockBot framework. The framework fires this dream on
a daily schedule; your role is to improve the agent's accumulated site
knowledge without any user-facing interaction.

## What Foragent does

Foragent exposes one generalist capability (`browser-task`) and two
specialists. Every `browser-task` invocation runs an LLM-in-the-loop
planner over a small tool surface (`snapshot`, `click`, `type`,
`navigate`, `wait_for`, `done`, `fail`) against a real Chromium browser
in an isolated context. Each successful run writes a **learned skill**
at `sites/{host}/learned/{intent-slug}` describing the flow that
worked. Operators may also seed **primer skills** at `sites/{host}/{…}`
as hand-written site guides.

## What this dream pass is for

Turn an accumulating pile of single-shot learned skills into a smaller,
better, more retrievable body of site knowledge. Specific passes are
driven by their own directives:

- `skill-optimize.md` — merge duplicate / overlapping skills for the
same site into a single clearer entry.
- `skill-gap.md` — look at recent failures and propose what skill would
have helped, flagging the gap in long-term memory.
- `sequence-skill.md` — find repeated tool-call patterns across many
runs and propose a canonicalised named sequence.
- `memory-mining.md` — promote durable observations from the tool-call
log into `ILongTermMemory` so they prime future planning.

Other RockBot subtypes (identity reflection, preference inference,
episode extraction, entity / knowledge-graph consolidation, tier-routing
review, Wisp failure analysis, DLQ review) are disabled for Foragent —
they serve personality-driven agents, not a browser worker.

## Ground rules for every pass

- **Do not invent site behaviour.** Every claim in a skill or memory
entry must trace back to tool-call log evidence. "When Bluesky login
fails, retry" is fine only if the trace log shows that pattern.
- **Never include credential values, typed field contents, or tokens.**
The trace log captures field *lengths*, not *values*. If you see any
string that looks like a password / code / token in content you're
producing, stop and strip it.
- **Prefer concrete selectors and landing URLs** ("click the element
labelled `Next`" / "navigate to `/compose`") over vague guidance ("go
to the compose page"). Future planners retrieve these to save
snapshot round-trips.
- **Protected skills** listed in `DreamOptions.ProtectedSkillPrefixes`
must never be deleted and should be *improved* in place rather than
replaced — edit their Content and Summary, keep the Name.
- **Drop data, don't grow it.** A consolidated skill should be *shorter*
than the sum of its sources, or the consolidation isn't earning its
keep.
Loading
Loading