From 9fac64e827f084393a46fe2846e282f516a2b392 Mon Sep 17 00:00:00 2001
From: Rockford Lhotka <rocky@lhotka.net>
Date: Wed, 22 Apr 2026 21:37:15 -0500
Subject: [PATCH] Step 7.5: daily dream loop with task-shaped directives
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Wire agent.AddScheduling() + agent.WithDreaming() in Program.cs. Enable
five subtypes relevant to a browser worker (main orchestrator,
skill-optimize, skill-gap, sequence-skill, memory-mining); disable the
eight personality-agent subtypes (preference, episode, tier-routing,
entity, graph-consolidation, identity, DLQ, Wisp).

RockBot ships no default directive content — intentionally, since the
framework can't know what any given agent needs. Foragent authors its
own five directives under src/Foragent.Agent/directives/, shipped via
CopyToOutputDirectory so they land at /app/directives/ in the container.
DreamService resolves each directive relative to
AgentProfileOptions.BasePath, which Program.cs configures to
"directives" (resolved against AppContext.BaseDirectory for relative
values — confirmed by IL inspection).

Dreams are opt-in: ForagentDreams:Enabled defaults false so `dotnet
run` smoke tests don't burn tokens; docker-compose.yml sets it true for
the full harness. Cron defaults to daily 03:00 UTC — framework default
of every 12h is too frequent. ProtectedSkillPrefixes stays empty
deliberately so operator primers get improved rather than frozen.

framework-feedback.md step-7.5 entry captures the directive
intentionality, the AgentProfileOptions.BasePath resolution path, and
a candidate companion-package offering (Directives.Personality /
Directives.Task) that would reduce onboarding cost without compromising
the no-hardcoded-content principle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CLAUDE.md                                     | 19 ++++-
 docker-compose.yml                            |  6 ++
 docs/framework-feedback.md                    | 75 +++++++++++++++++++
 src/Foragent.Agent/Foragent.Agent.csproj      |  7 ++
 src/Foragent.Agent/Program.cs                 | 60 +++++++++++++++
 src/Foragent.Agent/directives/dream.md        | 57 ++++++++++++++
 .../directives/memory-mining.md               | 75 +++++++++++++++++++
 .../directives/sequence-skill.md              | 75 +++++++++++++++++++
 src/Foragent.Agent/directives/skill-gap.md    | 68 +++++++++++++++++
 .../directives/skill-optimize.md              | 68 +++++++++++++++++
 10 files changed, 508 insertions(+), 2 deletions(-)
 create mode 100644 src/Foragent.Agent/directives/dream.md
 create mode 100644 src/Foragent.Agent/directives/memory-mining.md
 create mode 100644 src/Foragent.Agent/directives/sequence-skill.md
 create mode 100644 src/Foragent.Agent/directives/skill-gap.md
 create mode 100644 src/Foragent.Agent/directives/skill-optimize.md

diff --git a/CLAUDE.md b/CLAUDE.md
index a5f2a9b..ffb3e72 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 ## Status
 
-Foragent is at **milestone 7 shipped, step 8 next**. Three capabilities are advertised (`browser-task`, `fetch-page-title`, `extract-structured-data`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. Step 6 shipped the generalist `browser-task` planner (LLM-in-the-loop over ref-annotated aria snapshots + `aria-ref=eN` locator resolution, built on `Microsoft.Playwright` 1.59 — bumped from 1.50 for the Ai aria-snapshot mode; see Appendix A #16). Tiered chat clients are wired via `AddRockBotTieredChatClients` with one model aliased across Low/Balanced/High per spec §3.7. Step 7 wired the learning substrate: `ISkillStore` + `ILongTermMemory` via `WithSkills()` + `WithLongTermMemory()`, `BrowserTaskPriming` injects retrieved skill + memory content into the planner prompt, successful tasks write a learned skill at `sites/{host}/learned/{slug}`, and `BskySeedSkillService` seeds `sites/bsky.app/login` on first start (idempotent — only writes when absent). Embeddings are optional and configured separately under `ForagentEmbeddings` so they can live on a different Azure Foundry subscription than the chat model; missing embeddings downgrade retrieval to BM25-only with a single startup warning. The step-6 unaided benchmark (3/3) still passes after the priming wiring. `post-to-site` has been removed from both the advertised skill list and the codebase (greenfield deletion — `browser-task` + the learned bsky skill cover the use case). The governing spec is `docs/foragent-specification.md` **v0.2**. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md`. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.
+Foragent is at **milestone 7.5 shipped, step 8 next**. Three capabilities are advertised (`browser-task`, `fetch-page-title`, `extract-structured-data`); the A2A loop is wired end-to-end against RockBot via the `docker-compose.yml` harness pinned to `rockylhotka/rockbot-agent:0.8.5`. Step 6 shipped the generalist `browser-task` planner (LLM-in-the-loop over ref-annotated aria snapshots + `aria-ref=eN` locator resolution, built on `Microsoft.Playwright` 1.59 — bumped from 1.50 for the Ai aria-snapshot mode; see Appendix A #16). Tiered chat clients are wired via `AddRockBotTieredChatClients` with one model aliased across Low/Balanced/High per spec §3.7. Step 7 wired the learning substrate: `ISkillStore` + `ILongTermMemory` via `WithSkills()` + `WithLongTermMemory()`, `BrowserTaskPriming` injects retrieved skill + memory content into the planner prompt, successful tasks write a learned skill at `sites/{host}/learned/{slug}`, and `BskySeedSkillService` seeds `sites/bsky.app/login` on first start (idempotent — only writes when absent). Embeddings are optional and configured separately under `ForagentEmbeddings` so they can live on a different Azure Foundry subscription than the chat model; missing embeddings downgrade retrieval to BM25-only with a single startup warning. The step-6 unaided benchmark (3/3) still passes after the priming wiring. `post-to-site` has been removed from both the advertised skill list and the codebase (greenfield deletion — `browser-task` + the learned bsky skill cover the use case). The governing spec is `docs/foragent-specification.md` **v0.2**. Storage-state persistence, 2FA input-required flow, k8s-secrets broker, and per-tenant credential namespaces remain deferred — tracked in `docs/framework-feedback.md`. Framework-level observations from each milestone are captured in `docs/framework-feedback.md`.
 
 ## Build / test
 
@@ -103,7 +103,22 @@ On successful completion (`state.IsDone`), `BrowserTaskCapability.TryWriteLearne
 
 `BskySeedSkillService` (IHostedService) seeds `sites/bsky.app/login` on first start by calling `ISkillStore.GetAsync` and only writing if absent — docker volume recreation reseeds cleanly; operator edits to the skill through other channels are preserved.
 
-Skill naming follows spec §5.6: `sites/{host}/{intent}` for human-authored primers, `sites/{host}/learned/{slug}` for agent-generated. `Skill.SeeAlso` cross-references related skills to surface clusters rather than single entries. **Note:** `Skill` (from `RockBot.Host 0.8.5`) does not carry tags, metadata, or importance — the `agent-learned` distinction is encoded in the name prefix only.
+Skill naming follows spec §5.6: `sites/{host}/{intent}` for human-authored primers, `sites/{host}/learned/{slug}` for agent-generated. `Skill.SeeAlso` cross-references related skills to surface clusters rather than single entries. **Note:** `Skill` (from `RockBot.Host 0.8.5`) does not carry tags, metadata, or importance — the `agent-learned` distinction is encoded in the name prefix only. The dream loop (below) keeps the distinction from mattering at retrieval time: skills get improved, merged, and deduped across origins on a daily cadence.
+
+## Dream loop (step 7.5)
+
+Foragent runs a daily RockBot dream pass to consolidate accumulated skills and memory. Wired via `agent.AddScheduling()` + `agent.WithDreaming(opts)` inside `AddRockBotHost`. Five subtypes are enabled, eight are off:
+
+- **Enabled:** main orchestrator (`dream.md`), skill-optimize (merge/dedup), skill-gap (detect missing coverage), sequence-skill (detect repeated tool patterns), memory-mining (promote durable observations to `ILongTermMemory`).
+- **Disabled:** preference inference, episode extraction, tier-routing review, entity extraction, graph consolidation, identity reflection, DLQ review, Wisp failure analysis. All personality-agent territory.
+
+`ProtectedSkillPrefixes = []` — empty on purpose. Operator primers like `sites/bsky.app/login` are *improved in place* by the dream, not frozen; the seed service only writes on a cold boot, so later dream-authored improvements survive restarts. Operators who need to reset a primer can delete the stored skill file and bounce the host.
+
+Directive files live at `src/Foragent.Agent/directives/*.md` and ship with the binary via `<Content Include="directives/*.md" CopyToOutputDirectory="PreserveNewest" />`. `DreamService` resolves each `DreamOptions.*DirectivePath` relative to `AgentProfileOptions.BasePath` (confirmed by IL inspection — relative base paths combine against `AppContext.BaseDirectory`, which is the binary output dir). Program.cs configures `AgentProfileOptions.BasePath = "directives"`; no `WithProfile()` call, Foragent doesn't need the personality-profile doc set.
+
+Dreams are **opt-in** via `ForagentDreams:Enabled`. `appsettings.json` defaults false so `dotnet run` smoke tests don't trigger scheduled LLM calls; `docker-compose.yml` sets `ForagentDreams__Enabled=true` because that's the "full operating mode" shape. `CronSchedule` defaults to `0 3 * * *` (03:00 UTC daily) — the framework default of every 12 hours is too frequent for a browser worker. `InitialDelay` is the framework default (5 minutes from start), which is fine in prod but worth noting if someone spins up the compose harness for a 10-minute smoke session.
+
+**Don't add directive content to the RockBot agent's `deploy/rockbot-seed/` set.** Foragent's directives are task-shaped (browser outcomes, site knowledge); RockBot's are personality-shaped (identity, preferences). Mixing them defeats the reason Foragent authored its own.
 
 ## Credentials
 
diff --git a/docker-compose.yml b/docker-compose.yml
index 2f0b7e1..bd170e4 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -81,6 +81,12 @@ services:
       # mounted volume below so learned site knowledge survives restarts.
       ForagentMemory__SkillsPath: /data/foragent/skills
       ForagentMemory__MemoryPath: /data/foragent/memory
+      # Step 7.5: daily dream pass to consolidate accumulated skills +
+      # memory. Disabled by default in appsettings.json so `dotnet run` smoke
+      # tests don't burn tokens; opt-in here because the compose harness is
+      # the "full operating mode" shape. CronSchedule default is 03:00 UTC
+      # daily — override via ForagentDreams__CronSchedule if needed.
+      ForagentDreams__Enabled: "true"
       # Optional Bluesky credential used by future credentialed browser-task
       # runs. Flat id (no slashes) because env-var keys use __ to separate
       # config segments. Leave unset to disable.
diff --git a/docs/framework-feedback.md b/docs/framework-feedback.md
index dcc86a8..08915c8 100644
--- a/docs/framework-feedback.md
+++ b/docs/framework-feedback.md
@@ -409,3 +409,78 @@ scenarios still pass on first attempt — the priming wiring itself adds
 no overhead when the stores return nothing, confirming the fail-soft
 contract. A separate benchmark with a populated store is step-8-or-later
 work (need a curated skill set worth priming against).
+
+## Step 7.5 — dream loop
+
+### Framework observations
+
+- **Dream directives don't ship with the framework.** `DreamOptions`
+  defaults to bare filenames (`dream.md`, `skill-optimize.md`,
+  `sequence-skill.md`, etc.) that `DreamService` reads at runtime. The
+  `RockBot.Host`/`RockBot.Host.Abstractions` assemblies carry **zero
+  embedded resources** — no `.md` defaults, no stub directives. The
+  RockBot agent ships its directive set inside its docker image
+  (`/app/agent/*.md`), and `docker-compose.yml`'s `rockbot-init` step
+  copies them to `/data/agent/`. This is intentional (per operator
+  guidance: the framework can't know what any given consumer needs),
+  but it means every new framework consumer carries a ~300-line
+  directive-authoring cost as a prerequisite to turning on dreams.
+  Candidate framework offering (not an ask, since the intentionality
+  is real): optional companion packages like
+  `RockBot.Host.Directives.Personality` and
+  `RockBot.Host.Directives.Task` that ship starter directive sets,
+  selectable by `WithDreaming(opts => opts.UsePersonalityDefaults())`
+  or similar. Reduces onboarding cost without compromising the
+  no-hardcoded-content principle.
+
+- **Directive paths resolve via `AgentProfileOptions.BasePath`.** IL
+  inspection of `DreamService`'s `ResolvePath` helper confirms: for
+  each directive (e.g. `opts.SkillOptimizeDirectivePath =
+  "skill-optimize.md"`), the final path is:
+  `Path.Combine(basePath, directive)` where `basePath` comes from
+  `IOptions<AgentProfileOptions>.Value.BasePath`. If `basePath` is
+  relative, it combines against `AppContext.BaseDirectory` (binary
+  output dir). Foragent configures `AgentProfileOptions.BasePath =
+  "directives"` and ships markdown files alongside the binary via
+  `CopyToOutputDirectory=PreserveNewest` — no `WithProfile()` call
+  needed. Worth documenting in RockBot's dream-loop guide: consumers
+  that don't load a personality profile still need to Configure the
+  options type because that's the single source of truth for directive
+  base paths.
+
+- **`DreamService`'s constructor pulls 17 dependencies.** Everything
+  the dream subtypes might need (`IConversationLog`, `IDlqSampler`,
+  `IWispExecutionLog`, `IKnowledgeGraph`, `TierRoutingLogger`, …) is a
+  hard ctor parameter, so the framework registers stub / no-op
+  implementations for the ones a given agent doesn't use. Works, but
+  consumers who turn off a subtype shouldn't need its stores in DI at
+  all. Candidate framework refactor: make the subtype dependencies
+  optional (`IEnumerable<IDreamSubtype>` or similar) so
+  `DreamService.StartAsync` enumerates whatever's registered and skips
+  what isn't. Lower priority than the directives ask.
+
+- **`ProtectedSkillPrefixes` literal-only.** The list is
+  `List<string>` and (from the IL) matched via `StartsWith` — no
+  wildcard expansion. Foragent ships it empty; operators can add
+  specific literals if they need to freeze a skill. Noting because
+  wildcard-style patterns (`sites/*/login`) would be a natural
+  extension and aren't there today.
+
+### Manual verification plan
+
+Automated tests for the dream loop would require faking the scheduler
+and running an end-to-end pass — out of scope. Verified manually via
+docker-compose:
+
+- Container starts with `ForagentDreams__Enabled=true` → startup log
+  shows `ForagentDreams enabled; daily dream pass on schedule '0 3 * *
+  *'`.
+- Container starts with dreams disabled → log shows the opposite and
+  `DreamService` is not registered.
+- Directive files present at `/app/directives/*.md` inside the
+  container (verified via `docker compose exec foragent ls
+  /app/directives/`).
+
+First live dream pass against a non-empty skill store will be observed
+after enough `browser-task` runs accumulate — probably step 8 or when
+the operator turns the harness on for a sustained session.
diff --git a/src/Foragent.Agent/Foragent.Agent.csproj b/src/Foragent.Agent/Foragent.Agent.csproj
index 42339da..0474bf6 100644
--- a/src/Foragent.Agent/Foragent.Agent.csproj
+++ b/src/Foragent.Agent/Foragent.Agent.csproj
@@ -25,4 +25,11 @@
     <ProjectReference Include="..\Foragent.Capabilities\Foragent.Capabilities.csproj" />
     <ProjectReference Include="..\Foragent.Credentials\Foragent.Credentials.csproj" />
   </ItemGroup>
+  <ItemGroup>
+    <!-- Dream directives (step 7.5). DreamService resolves directive paths
+         relative to AgentProfileOptions.BasePath; with a relative BasePath
+         they combine against AppContext.BaseDirectory (the binary output
+         dir), so shipping the markdown alongside the binary is enough. -->
+    <Content Include="directives\*.md" CopyToOutputDirectory="PreserveNewest" />
+  </ItemGroup>
 </Project>
diff --git a/src/Foragent.Agent/Program.cs b/src/Foragent.Agent/Program.cs
index db6412c..a0fceef 100644
--- a/src/Foragent.Agent/Program.cs
+++ b/src/Foragent.Agent/Program.cs
@@ -83,6 +83,21 @@
 var skillsPath = memorySection["SkillsPath"] ?? "data/skills";
 var memoryPath = memorySection["MemoryPath"] ?? "data/memory";
 
+// Dream loop (step 7.5). Opt-in via ForagentDreams:Enabled — default off so
+// local `dotnet run` smoke tests don't trigger scheduled LLM calls. The
+// docker-compose harness sets this to true. CronSchedule defaults to 03:00
+// UTC daily; framework default is every 12 hours, too frequent for a browser
+// worker. Directive files ship alongside the binary under ./directives/ —
+// DreamService resolves each directive path relative to
+// AgentProfileOptions.BasePath (confirmed via IL inspection; relative paths
+// combine against AppContext.BaseDirectory).
+var dreamsSection = builder.Configuration.GetSection("ForagentDreams");
+var dreamsEnabled = dreamsSection.GetValue<bool?>("Enabled") ?? false;
+var directivesPath = dreamsSection["DirectivesPath"] ?? "directives";
+var dreamsCron = dreamsSection["CronSchedule"] ?? "0 3 * * *";
+
+builder.Services.Configure<AgentProfileOptions>(opts => opts.BasePath = directivesPath);
+
 builder.Services.AddRockBotHost(agent =>
 {
     agent.WithIdentity(agentName);
@@ -106,6 +121,38 @@
     agent.WithSkills(opts => opts.BasePath = skillsPath);
     agent.WithLongTermMemory(opts => opts.BasePath = memoryPath);
 
+    if (dreamsEnabled)
+    {
+        agent.AddScheduling();
+        agent.WithDreaming(opts =>
+        {
+            opts.Enabled = true;
+            opts.CronSchedule = dreamsCron;
+
+            // Task-shaped dream subtypes (see directives/dream.md).
+            opts.SkillGapEnabled = true;
+            opts.SequenceSkillDetectionEnabled = true;
+            opts.MemoryMiningEnabled = true;
+
+            // Personality-shaped subtypes — not applicable to a browser
+            // worker. Disabling these skips both the LLM call and the
+            // directive-file lookup.
+            opts.PreferenceInferenceEnabled = false;
+            opts.EpisodeExtractionEnabled = false;
+            opts.TierRoutingReviewEnabled = false;
+            opts.EntityExtractionEnabled = false;
+            opts.GraphConsolidationEnabled = false;
+            opts.IdentityReflectionEnabled = false;
+            opts.DlqReviewEnabled = false;
+            opts.WispFailureAnalysisEnabled = false;
+
+            // Empty protected list — the goal is that the dream improves
+            // primer skills over time, not that primers are frozen
+            // (operator can still edit them through other channels).
+            opts.ProtectedSkillPrefixes = [];
+        });
+    }
+
     agent.Services.AddForagentCapabilities();
     agent.Services.AddHostedService<BskySeedSkillService>();
 });
@@ -157,6 +204,19 @@
         + "Set ForagentEmbeddings:Endpoint/ModelId/ApiKey to enable semantic retrieval.");
 }
 
+if (dreamsEnabled)
+{
+    app.Logger.LogInformation(
+        "ForagentDreams enabled; daily dream pass on schedule '{Cron}' will consolidate skills and memory.",
+        dreamsCron);
+}
+else
+{
+    app.Logger.LogInformation(
+        "ForagentDreams disabled. Learned skills will accumulate without consolidation; "
+        + "set ForagentDreams:Enabled=true to turn on the daily dream pass.");
+}
+
 app.Run();
 
 public partial class Program;
diff --git a/src/Foragent.Agent/directives/dream.md b/src/Foragent.Agent/directives/dream.md
new file mode 100644
index 0000000..1ac1b1d
--- /dev/null
+++ b/src/Foragent.Agent/directives/dream.md
@@ -0,0 +1,57 @@
+# Foragent dream loop
+
+You are the dream pass for **Foragent**, a task-level browser-automation
+agent built on the RockBot framework. The framework fires this dream on
+a daily schedule; your role is to improve the agent's accumulated site
+knowledge without any user-facing interaction.
+
+## What Foragent does
+
+Foragent exposes one generalist capability (`browser-task`) and two
+specialists. Every `browser-task` invocation runs an LLM-in-the-loop
+planner over a small tool surface (`snapshot`, `click`, `type`,
+`navigate`, `wait_for`, `done`, `fail`) against a real Chromium browser
+in an isolated context. Each successful run writes a **learned skill**
+at `sites/{host}/learned/{intent-slug}` describing the flow that
+worked. Operators may also seed **primer skills** at `sites/{host}/{…}`
+as hand-written site guides.
+
+## What this dream pass is for
+
+Turn an accumulating pile of single-shot learned skills into a smaller,
+better, more retrievable body of site knowledge. Specific passes are
+driven by their own directives:
+
+- `skill-optimize.md` — merge duplicate / overlapping skills for the
+  same site into a single clearer entry.
+- `skill-gap.md` — look at recent failures and propose what skill would
+  have helped, flagging the gap in long-term memory.
+- `sequence-skill.md` — find repeated tool-call patterns across many
+  runs and propose a canonicalised named sequence.
+- `memory-mining.md` — promote durable observations from the tool-call
+  log into `ILongTermMemory` so they prime future planning.
+
+Other RockBot subtypes (identity reflection, preference inference,
+episode extraction, entity / knowledge-graph consolidation, tier-routing
+review, Wisp failure analysis, DLQ review) are disabled for Foragent —
+they serve personality-driven agents, not a browser worker.
+
+## Ground rules for every pass
+
+- **Do not invent site behaviour.** Every claim in a skill or memory
+  entry must trace back to tool-call log evidence. "When Bluesky login
+  fails, retry" is fine only if the trace log shows that pattern.
+- **Never include credential values, typed field contents, or tokens.**
+  The trace log captures field *lengths*, not *values*. If you see any
+  string that looks like a password / code / token in content you're
+  producing, stop and strip it.
+- **Prefer concrete selectors and landing URLs** ("click the element
+  labelled `Next`" / "navigate to `/compose`") over vague guidance ("go
+  to the compose page"). Future planners retrieve these to save
+  snapshot round-trips.
+- **Protected skills** listed in `DreamOptions.ProtectedSkillPrefixes`
+  must never be deleted and should be *improved* in place rather than
+  replaced — edit their Content and Summary, keep the Name.
+- **Drop data, don't grow it.** A consolidated skill should be *shorter*
+  than the sum of its sources, or the consolidation isn't earning its
+  keep.
diff --git a/src/Foragent.Agent/directives/memory-mining.md b/src/Foragent.Agent/directives/memory-mining.md
new file mode 100644
index 0000000..31f220d
--- /dev/null
+++ b/src/Foragent.Agent/directives/memory-mining.md
@@ -0,0 +1,75 @@
+# Memory-mining pass
+
+Goal: promote durable observations from recent `browser-task` runs
+into `ILongTermMemory` so they prime future planning without growing
+the skill store.
+
+## What belongs in memory (not in skills)
+
+Skills are **procedural** — "how to do X on site Y." Memory is
+**declarative** — facts and observations that don't fit the
+how-to-do-X shape. Examples of good memory entries:
+
+- "bsky.app enforces an email-code challenge roughly 1-in-10 logins
+  from fresh contexts" — site behaviour, not a procedure.
+- "example.com served a 503 maintenance page on 2026-04-21; retries
+  after 2026-04-22 succeeded" — time-bounded incident.
+- "the bsky compose editor rejects `FillAsync` on its ProseMirror
+  root; only keystroke-based typing works" — tooling quirk worth
+  remembering across capabilities.
+- "Cloudflare challenge pages show a checkbox labelled 'Verify you are
+  human'; no successful automated bypass has been observed" — a
+  concrete negative finding.
+
+## What does NOT belong in memory
+
+- Credential values, tokens, typed field contents.
+- Specific user data (post text, usernames, message bodies).
+- One-off successful runs that are already captured by a learned
+  skill.
+- Generic observations ("sites take time to load") — too vague to
+  retrieve usefully.
+
+## Inputs
+
+Recent tool-call logs and browser-task results, plus the existing
+memory entries (so you don't duplicate).
+
+## What to look for
+
+1. **Site-level behaviours** that appear across multiple runs (captcha
+   prompts, rate-limit responses, maintenance windows, DOM changes).
+2. **Tooling quirks** — situations where a capability's tool call
+   behaved unexpectedly in a specific way that would save future runs
+   time to know about.
+3. **Negative findings** — things that were tried and *didn't* work,
+   saving a future planner from repeating the attempt.
+
+## Output format
+
+For each memory worth recording:
+
+```
+MEMORY {category} | [tags]
+{One-paragraph observation. Lead with the specific, observable fact.
+Keep under ~80 words. If it has a date boundary, include it
+explicitly.}
+```
+
+Category should be `sites/{host}` when the observation is site-specific,
+or a general category like `browser-tooling`, `captcha-patterns`,
+`rate-limit-patterns` otherwise.
+
+Tags are a free-form subset of: `site-behaviour`, `tooling-quirk`,
+`negative-finding`, `incident`, `rate-limit`, `captcha`, `dom-change`.
+Keep to 1-3 tags per entry.
+
+## What not to do
+
+- **Do not** emit memory entries for already-captured facts. Skim the
+  existing memory first.
+- **Do not** write essays. A memory entry should be a single tight
+  paragraph a future planner can retrieve and integrate quickly.
+- **Do not** include credential values, typed content, or user data.
+- **Do not** create memory entries for facts that would be better as a
+  skill — if it answers "how do I do X," it's a skill, not a memory.
diff --git a/src/Foragent.Agent/directives/sequence-skill.md b/src/Foragent.Agent/directives/sequence-skill.md
new file mode 100644
index 0000000..04dd3b4
--- /dev/null
+++ b/src/Foragent.Agent/directives/sequence-skill.md
@@ -0,0 +1,75 @@
+# Sequence-skill detection pass
+
+Goal: find repeated tool-call patterns across successful `browser-task`
+runs and propose a named, reusable skill so future planners can
+retrieve the pattern directly instead of rediscovering it.
+
+## Inputs
+
+The tool-call log for recent `browser-task` runs, grouped by primary
+host. Each entry includes:
+- The intent text.
+- Ordered tool-call names (arguments omitted for privacy — you see
+  `type(ref, …)` without the value).
+- Final outcome (`done` / `fail` / `incomplete`).
+- Step count and duration.
+
+## What to look for
+
+1. **Recurrent prefixes.** Multiple successful runs that start with the
+   same 3+ tool calls (often "navigate to login URL, snapshot, click
+   Sign-in"). That's a candidate login primer.
+
+2. **Recurrent mid-sequences.** A 4–6 step pattern that appears inside
+   runs with different overall intents — e.g. "click menu → click
+   Settings → navigate Settings URL" appears in three different
+   settings-related tasks. That's a candidate navigation primer.
+
+3. **Recurrent error-recovery.** A pattern where the planner hits a
+   specific state, recovers, and succeeds (e.g. "dismiss cookie banner
+   → retry click"). Worth a primer so future runs skip the recovery
+   phase.
+
+## Threshold
+
+Require **at least 3 distinct successful runs** exhibiting the pattern
+before proposing a skill. Two matches is coincidence; three is a
+pattern worth remembering.
+
+## Output format
+
+For each sequence-skill candidate, emit:
+
+```
+UPSERT sites/{host}/{slug} | {summary-15-words-or-less}
+# {Human-readable title}
+
+**When to use:** {one-sentence trigger — what a future planner's intent
+text or current URL should look like}
+
+**Steps:**
+1. {tool-call + reason, e.g. "navigate to https://{host}/login"}
+2. …
+
+**Known pitfalls:** {brief; only if the evidence shows a recovery
+pattern}
+
+**See also:** {list of existing related skills}
+---
+```
+
+Slug should be a short kebab-case name. Prefer verbs ("open-compose",
+"dismiss-cookie-banner") over nouns.
+
+## What not to do
+
+- **Do not** propose a sequence skill where the only common element is
+  "navigate to site root, then snapshot." That's not a pattern, that's
+  the default shape of every task.
+- **Do not** emit argument values — the log omits them for a reason.
+- **Do not** emit sequence skills longer than ~8 steps. Long sequences
+  are fragile against site changes and benefit the next planner less
+  than a well-written shorter primer.
+- **Do not** duplicate an existing skill. If `sites/{host}/login`
+  already exists and your candidate sequence matches it, either emit
+  nothing or emit an UPSERT that *improves* the existing one.
diff --git a/src/Foragent.Agent/directives/skill-gap.md b/src/Foragent.Agent/directives/skill-gap.md
new file mode 100644
index 0000000..5796aa1
--- /dev/null
+++ b/src/Foragent.Agent/directives/skill-gap.md
@@ -0,0 +1,68 @@
+# Skill-gap detection pass
+
+Goal: identify tasks that failed or struggled because Foragent was
+missing a relevant site skill, and record the gap so future operator
+priming or future dream passes can close it.
+
+## Inputs
+
+You will be given the recent `browser-task` traces where the outcome
+was either:
+
+- `failed` (planner called `fail()`), or
+- `incomplete` (budget exhausted before `done()` or `fail()`), or
+- `done` but with an unusually high step count relative to peers for
+  the same host.
+
+Alongside each trace, you will see the skills (if any) that were
+injected into the planner prompt as priming.
+
+## What to look for
+
+1. **Failures where no primer existed for the host.** If the task
+   targeted `sites/foo.example/*` and the skill store contains nothing
+   under `sites/foo.example/`, that's the clearest kind of gap.
+
+2. **Failures where the primer content did not cover the intent.** If
+   the task intent was "compose a post" but the only retrieved skill
+   was `sites/{host}/login`, the gap is a missing compose primer.
+
+3. **Recurring pain points within a single host.** Three failures on
+   bsky.app's 2FA email-code prompt in a week is worth a specific
+   entry, even if one successful run exists.
+
+4. **Tool thrash.** A trace with 30+ `snapshot` calls and no `click`
+   that succeeded usually means the planner didn't know which element
+   to target — a gap in selector-level guidance.
+
+## Output format
+
+For each gap identified, emit a memory entry:
+
+```
+MEMORY sites/{primary-host} | [tags]
+Missing primer / selector coverage for {intent summary}. Evidence: {N
+failed traces over {period}, most recent {date}}. Suggested content:
+{one-paragraph hint on what a future primer should cover}.
+```
+
+Tags should be a subset of: `gap`, `failure-cluster`,
+`missing-primer`, `selector-ambiguous`, `2fa-blocked`,
+`budget-exhausted`.
+
+Multiple gaps per host are fine — keep them separate entries so
+retrieval surfaces the specific flavour of gap that matches a future
+query.
+
+## What not to do
+
+- **Do not** write a gap entry for a single failed task. Require at
+  least two failures or one failure plus one struggle (high step count)
+  before flagging.
+- **Do not** invent selectors or URLs. The gap entry describes the
+  *shape* of the missing knowledge, not the content of it.
+- **Do not** include the failing intent verbatim if it contains
+  personal content, usernames, or data — describe the *pattern*.
+- **Do not** blame the planner for site changes. If the evidence shows
+  the site itself changed (new DOM, new domain), record that as a site
+  event, not a skill gap.
diff --git a/src/Foragent.Agent/directives/skill-optimize.md b/src/Foragent.Agent/directives/skill-optimize.md
new file mode 100644
index 0000000..079aaf7
--- /dev/null
+++ b/src/Foragent.Agent/directives/skill-optimize.md
@@ -0,0 +1,68 @@
+# Skill consolidation pass
+
+Goal: reduce the skill store to a smaller, clearer body of site
+knowledge by merging duplicates and rewriting entries for clarity.
+
+## What to look for
+
+You will be given the current list of skills and their Summaries,
+grouped by name prefix. Focus on:
+
+1. **Exact duplicates** — multiple skills under `sites/{host}/learned/`
+   that describe the same intent (e.g. `login-to-bsky-app`,
+   `sign-in-bluesky`, `authenticate-bsky`). Merge into one.
+
+2. **Primer + learned pairs** — an operator primer at `sites/{host}/{x}`
+   alongside one or more learned skills at `sites/{host}/learned/{…}`
+   describing the same flow. Improve the primer with whatever the
+   learned skills discovered (updated selectors, new failure modes,
+   faster paths). Delete the redundant learned entries.
+
+3. **Stale or superseded content** — a skill that claims "click the
+   button labelled X" when a later learned skill shows the label is now
+   Y. Prefer the newer evidence and say so.
+
+4. **Over-long skills** — anything past ~500 words that spends most of
+   its content on one-off anecdote rather than reusable procedure.
+   Rewrite for density.
+
+## When to merge
+
+Merge two skills when **all three** are true:
+- They describe the same landing URL or same sequence of intents.
+- Their successful flows overlap by more than half.
+- The combined skill would still fit comfortably inside ~400 words.
+
+Do not merge skills that happen to target the same site but different
+intents (e.g. `sites/bsky.app/login` vs `sites/bsky.app/compose-post`).
+Those stay separate — different retrieval contexts.
+
+## Output format
+
+For each change you want to make, emit one of:
+
+- `DELETE {skill-name}` — remove a redundant skill.
+- `UPSERT {skill-name} | {summary-15-words-or-less}`
+  followed by a markdown body on subsequent lines up to `---`.
+
+Example:
+
+```
+DELETE sites/bsky.app/learned/sign-in-with-app-password
+DELETE sites/bsky.app/learned/log-into-bluesky
+UPSERT sites/bsky.app/login | Log in to bsky.app with an app password; watch for 2FA challenges.
+Bluesky's public web app is at https://bsky.app…
+(full markdown body)
+---
+```
+
+## What not to do
+
+- **Do not** delete a skill whose name is in the protected-prefixes
+  list. Improve its content in place via UPSERT instead.
+- **Do not** merge across sites. `sites/foo.example/login` and
+  `sites/bar.example/login` are different knowledge.
+- **Do not** write speculative content. If the trace evidence does not
+  mention a specific selector, don't invent one — leave it vague.
+- **Do not** drop citations — if a learned skill references a URL or
+  observed selector, preserve that detail in the merged output.