Skip to content

feat(thesaurus): flush compiled thesaurus cache after graph markdown edits#852

Open
AlexMikhalev wants to merge 2204 commits intomainfrom
task/945-flush-thesaurus-cache
Open

feat(thesaurus): flush compiled thesaurus cache after graph markdown edits#852
AlexMikhalev wants to merge 2204 commits intomainfrom
task/945-flush-thesaurus-cache

Conversation

@AlexMikhalev
Copy link
Copy Markdown
Contributor

Implements automatic thesaurus cache invalidation when knowledge graph markdown source files are edited.

Changes

  • Add terraphim_automata::hash module for KG directory hash computation (twox-hash)
  • Add terraphim_persistence::hash_store module for hash storage/retrieval in KV store
  • Add hash validation to TerraphimService::ensure_thesaurus_loaded() - rebuilds on mismatch
  • Add invalidate_thesaurus_cache() and flush_thesaurus_cache() to service
  • Clear #[cached] in-process memoization on persistent cache invalidation
  • Add 'cache flush [--role ROLE]' CLI subcommand to terraphim-agent
  • Add '/cache flush [--role ROLE]' REPL command
  • Add regression test test_thesaurus_cache_invalidation_on_kg_edit

Verification

  • 441 tests pass
  • 99.31% coverage on hash module
  • UBS scan: no critical issues

Refs #945

AlexMikhalev and others added 30 commits April 19, 2026 21:31
…ause - Refs terraphim/adf-fleet#6

Wire the existing CostTracker budget check into the spawn path.
`CostTracker::check()` was only consumed by the routing engine to apply
BudgetPressure scoring penalties; dispatch still went ahead even when
BudgetVerdict::Exhausted came back. Now the gate short-circuits at the
top of spawn_agent so an agent whose monthly cap is blown does not run
at all this cycle. Also emits a warn-level trace on NearExhaustion so
operators see the soft-limit crossing before the hard pause.

Placed after the disk-space guard and before the pre-check gate so we
never waste pre-check work on an agent that cannot spawn.

Tests:
- test_spawn_agent_skips_when_budget_exhausted: registers a $1 cap,
  records $2 spend, asserts spawn is a no-op (no entry in
  active_agents, Ok returned).
- test_spawn_agent_runs_when_budget_uncapped: confirms an agent with
  budget_monthly_cents=None spawns even after large recorded spend.
Replace the single top-level `adf/mention_cursor` persistence key with
per-project keys `adf/mention_cursor/<project_id>` so each project can
advance its repo-wide comment poll cursor independently. Legacy
single-project installations pass the synthetic `__global__` project id
and continue to work without config changes.

- `MentionCursor::load_or_now(project_id: &str)` and
  `MentionCursor::save(&self, project_id: &str)` take the project id
  explicitly; both use the new `cursor_key()` helper. Project id is
  included on log events for multi-project debuggability.
- Orchestrator swaps `mention_cursor: Option<MentionCursor>` for
  `mention_cursors: HashMap<String, MentionCursor>` and routes the
  poll/webhook paths through that map.
- Webhook-dispatched comment ids are now stamped onto every project
  cursor (or the legacy `__global__` cursor when `projects` is empty)
  so subsequent polls skip them regardless of which project the
  comment originated from. The webhook payload does not yet carry
  project info; stamping every cursor is a safe superset.
- `poll_mentions` continues to use the legacy single-project path under
  the `__global__` key; per-project fan-out lands in the next commit.

Refs terraphim/adf-fleet#5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add `mention::migrate_legacy_mention_cursor(projects)` to copy the
legacy top-level `adf/mention_cursor` key to per-project keys
`adf/mention_cursor/<project_id>` on first startup after the schema
change, then delete the legacy key so the migration is idempotent.

- Migration targets every configured project id plus the synthetic
  `__global__` id so both multi-project and legacy single-project
  installations see their cursor preserved.
- Per-project targets that already have a cursor (operator-provided
  `stat()` succeeds) are left untouched -- the poller's advance wins
  over the pre-migration snapshot.
- Unparsable legacy cursors are deleted rather than propagated: the
  poller will then synthesise fresh per-project `now()` cursors on
  first use, preserving the replay-storm guard.
- Storage errors are logged but non-fatal; the reconciliation loop
  continues regardless so a transient sqlite hiccup cannot block
  orchestrator startup.
- Wired into `AgentOrchestrator::run()` between telemetry restore and
  the safety-agent spawn so it runs exactly once per process and
  before any poll tick could observe the old key.

Refs terraphim/adf-fleet#5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ws - Refs terraphim/adf-fleet#6

Add a provider-level spend tracker that complements the per-agent
CostTracker. Tracks accumulated USD spend per external LLM provider
(opencode-go, kimi-for-coding, ...) in tumbling UTC hour and day
buckets and returns the existing BudgetVerdict so the routing engine
can uniformly gate on either signal.

Design choices (audited against existing primitives):
- Re-use BudgetVerdict from cost_tracker (no parallel enum).
- Do NOT extend TokenBucketLimiter: it is async, uses std::Instant
  (not serialisable), and tracks a single minute window. A cost-based
  tumbling-bucket tracker with persistence would require rewriting
  half of it; a new focused module is clearer.
- Tumbling UTC buckets mirror CostTracker's calendar-month reset
  pattern, which operators already understand.
- std::sync::Mutex for the two per-provider window cells -- critical
  sections are tiny; async locks would add needless complexity.
- Atomic JSON snapshot write via `.tmp` + rename for crash safety.
- apply_snapshot discards state for providers that are no longer in
  the current config so stale entries do not linger after edits.

Public API mirrors CostTracker: `new`, `with_persistence`,
`record_cost`, `check`, `snapshot`, `persist`, plus `record_cost_at`
and `check_at` test hooks so boundary-crossing tests do not depend
on wall-clock drift. A small `provider_has_budget` helper is
exposed for the routing filter that lands next.

Tests (9): unknown-provider uncapped; hour-window exhaustion; hour
reset on the next hour; day cap trips across hours; day reset on the
next day; snapshot round-trip via tempfile; combine_verdicts picks
the worst signal; helper reports exhaustion; stale snapshot entry
for a removed provider is discarded on reload.
…t gitea repos

Rewrite poll_mentions to build a list of (project_id, gitea_cfg, mention_cfg)
targets from config.projects. Each project's gitea config is polled with its
own MentionConfig override (falling back to top-level mentions, then default).
When projects is empty, falls back to legacy __global__ target using the
top-level gitea block.

Active mention-agent count now filters by agent definition's project field
when running per-project; legacy mode counts all spawned_by_mention agents.

Refs terraphim/adf-fleet#5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r, migration

New integration test file mention_multi_repo_tests.rs with 21 tests covering:

Regex (parse_mention_tokens):
- unqualified / qualified capture
- mixed qualified + unqualified in one comment
- rejection of uppercase / over-long project prefix
- trailing punctuation and plain @-mentions

resolve_mention (project-aware):
- qualified exact match, not found, ambiguous
- unqualified legacy-mode matches any project
- unqualified hinted-project preference
- unqualified fallback to unbound agent
- unqualified ambiguous (hinted + unbound) returns None
- unqualified unknown name returns None

parse_mentions: stamps legacy and hinted project_id onto DetectedMention

MentionCursor: in-memory per-project isolation, monotonic advance_to

migrate_legacy_mention_cursor: no-op safety under memory-only storage

Refs terraphim/adf-fleet#5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+ routing - Refs terraphim/adf-fleet#6

Add [[providers]] config block and `provider_budget_state_file` pointing
at an optional JSON snapshot. When configured, the orchestrator builds a
`ProviderBudgetTracker` and threads it through the routing engine via
`RoutingDecisionEngine::with_provider_budget`. The engine now:

- Strips `Exhausted` providers from the candidate set before scoring
  (with a warning log and rationale citing the provider key).
- Multiplies scores by 0.6 for `NearExhaustion` providers so healthier
  alternatives win without hard-banning the near-limit one.
- Extends `budget_influenced` so observers see either agent-level or
  provider-level pressure biased the selection.

A helper `provider_key_for_model(&str)` maps model strings
(`opencode-go/model`, bare `sonnet`, ...) to the budget-bucket key.
Three routing unit tests cover exhausted-drop, near-exhaustion
deprioritisation, and uncapped-provider pass-through.
…w - Refs terraphim/adf-fleet#6

The bare-name branch previously returned true for any unknown id, so
model = "minimax" (bare, no slash) silently bypassed the C3 banlist.
Switch to an explicit allow-list: only CLAUDE_CLI_BARE_MODELS,
ANTHROPIC_BARE_PROVIDERS, and ALLOWED_PROVIDER_PREFIXES pass as bare
names; everything else rejects.

Apply the same tightening in validate_model_provider so bare banned
ids are caught at config load time, not only at runtime.

Adds unit tests covering:
  is_allowed_provider("minimax") -> false
  is_allowed_provider("opencode") -> false
  is_allowed_provider("unknown") -> false
  validate_model_provider rejects bare "minimax"
…record_telemetry - Refs terraphim/adf-fleet#6

record_telemetry was the only place where real dispatch cost arrived
from CLI completion events, and it only fed cost_tracker. The new
ProviderBudgetTracker never received spend, so Layer 3 of the
subscription gate (Exhausted drop / NearExhaustion penalty in the
routing engine) was read-only at runtime.

Extend record_telemetry to also call provider_budget_tracker.record_cost
for each event with cost > 0, using provider_key_for_model to derive
the tracker key from the model string. Unknown or unconfigured
providers remain no-ops (tracker returns Uncapped and skips).

Adds:
  - record_telemetry_for_test and provider_budget_tracker accessors
    (doc-hidden) so integration tests can exercise the wiring without
    spinning up the full reconcile loop
  - record_telemetry_feeds_provider_budget_tracker: confirms a real
    CompletionEvent drives the hour/day counters and trips Exhausted
  - record_telemetry_ignores_zero_cost_and_unknown_model: guards
    against regressions that would poison unrelated buckets
…efs terraphim/adf-fleet#6

The snapshot file referenced by provider_budget_state_file was only
ever read at startup (via with_persistence); no production call site
ever invoked tracker.persist(), so the cross-restart promise was
structural only.

Call tracker.persist() at the end of every reconcile tick (step 16,
paired with step 15 telemetry persistence), and on graceful shutdown
after the main select! loop exits. Failures log a warning and do not
abort the tick, matching the fire-and-forget pattern used for the
telemetry store.

The persistence round-trip is covered by the existing
provider_budget_persistence_round_trip_via_orchestrator test, which
exercises a simulated restart end-to-end.
…im/adf-fleet#6

Previously each provider kept Mutex<WindowState> for hour and day
separately, so a concurrent recorder could interleave: update hour,
release, acquire day, update. An observer calling check() between the
two locks saw the hour bucket advanced but not the day bucket,
temporarily violating the day >= sum(hours-in-day) invariant.

Collapse both windows behind a single Mutex<ProviderWindows> so
record_cost_at and check_at observe a consistent snapshot across both
windows. update_window/check_window refactored to operate on borrowed
WindowState slots held by the caller's single lock.

No user-visible behaviour change for the single-threaded tests; this
only tightens the invariant under concurrent load.
…spatch path - Refs terraphim/adf-fleet#5

P1: `resolve_mention` and `parse_mention_tokens` now called from both
dispatch sites, not just tests.

Poll path (`poll_mentions_for_project`):
- Before the AdfCommandParser loop, run `parse_mention_tokens` on each
  comment body and dispatch qualified `@adf:project/name` tokens directly
  via `resolve_mention(Some(proj), project_id, agent, agents)`.
- Replace `agents.iter().find(|a| a.name == agent_name)` with
  `resolve_mention(None, project_id, agent_name, agents)` so unqualified
  mentions in multi-project mode prefer the hinted-project agent.

Webhook path (`handle_webhook_dispatch`):
- Add `detected_project: Option<String>` to `WebhookDispatch::SpawnAgent`.
- In `handle_gitea_webhook`, parse mention tokens before the Aho-Corasick
  pass; collect qualified tokens into separate dispatches (qualified
  mentions are not substrings of `@adf:{name}` patterns).
- Replace `.find(|a| a.name == agent_name)` with
  `resolve_mention(detected_project.as_deref(), LEGACY_PROJECT_ID, ...)`.

P2 (both):
- Deduplicate `LEGACY_PROJECT_ID` in mention.rs: replace local `const`
  with `pub(crate) use crate::dispatcher::LEGACY_PROJECT_ID`.
- Emit `tracing::debug!` for projects that lack a `gitea` block so
  operators see which projects are skipped during mention polling.

Tests: +3 dispatch-wiring integration tests (24 total in
mention_multi_repo_tests; 513 total in crate, 0 failed).

Co-Authored-By: Terraphim AI <noreply@terraphim.ai>
…epo support' (#619) from task/adf-fleet-5-mention-multi-repo into main
…ead CostTracker path' (#620) from task/adf-fleet-6-provider-gate into main
…, spawner race

Five test fixes that surfaced after the auto-route landing.

- terraphim_agent shell_dispatch: try /usr/bin/false (macOS) before
  /bin/false (Linux) so the exit-code-capture test works on both.
- terraphim_mcp_server integration_test: pass role=Default explicitly
  in test_mcp_server_integration and test_search_pagination so the new
  auto-route text content does not throw off content-shape assertions.
- terraphim_mcp_server mcp_rolegraph_validation_test: count resource
  contents directly (filter c.as_resource().is_some()) instead of
  content.len()-1, robust to the auto-route prepend.
- terraphim_spawner: replace try_recv-after-sleep with timeout-bounded
  recv loop and write a sleep-then-pwd shell script so the broadcast
  channel always has a subscriber by the time the child writes.
- terraphim_service: drop the deprecated JMAP_MISSING_TOKEN_DOWNWEIGHT
  re-export and constant -- design kept it for fixture link-compat,
  no fixture used it, removing it removes the deprecation warning.

Refs #617

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tate

Two clippy warnings introduced by the auto-route work:
- terraphim_service auto_route: tied.iter().any(|n| *n == sel) is
  cleaner as tied.contains(&sel).
- terraphim_mcp_server lib: &*self.config_state is auto-deref'd to
  &self.config_state.

clippy clean for the affected crates.

Refs #617

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fixtures - Refs terraphim/adf-fleet#9

- PEP 723 inline-metadata script using uv run (tomllib stdlib + tomli-w)
- --input (repeatable), --output-dir, --base-output, --dry-run flags
- project_id derived from filename stem (orchestrator.toml -> terraphim)
- Fixture files: orchestrator.toml (3 agents, 1 flow), odilo-orchestrator.toml (2 agents), banned-provider.toml

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
…ection - Refs terraphim/adf-fleet#9

Already bundled in migration script:
- build_project_entry: extracts working_dir/gitea/quickwit/workflow/mentions into [[projects]]
- build_agent_entries: injects project = "<project_id>" into each [[agents]] entry
- build_flow_entries: injects project = "<project_id>" into each [[flows]] entry
- build_base_doc: assembles global settings + include = ["conf.d/*.toml"]
- Idempotent: tomli_w serialises deterministically; running twice is byte-identical

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
…terraphim/adf-fleet#9

validate_models() checks model/fallback_model fields on all [[agents]] and
compound_review. Banned prefixes: opencode/ github-copilot/ google/ huggingface/
Exits non-zero with message: ERROR: Agent 'NAME' uses banned provider 'VALUE'

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
…rejection - Refs terraphim/adf-fleet#9

- 6 tests covering round-trip structure, idempotence, banned-provider rejection,
  flow project injection, dry-run no-write, github-copilot/ ban
- Fixed non-deterministic dict key ordering in build_base_doc() using sorted()
- Tests invoke script via subprocess (black-box, no mocks)
- All 6 pass: uv run --with pytest pytest tests/test_migrate.py -v

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
…s - Refs terraphim/adf-fleet#9

Reference output generated by:
  uv run migrate-to-confd.py \\
    --input tests/fixtures/orchestrator.toml \\
    --input tests/fixtures/odilo-orchestrator.toml \\
    --output-dir tests/expected/ \\
    --base-output tests/expected/orchestrator.toml

- tests/expected/orchestrator.toml: base config with include = ["conf.d/*.toml"]
- tests/expected/terraphim.toml: [[projects]], 3 [[agents]], 1 [[flows]] with project="terraphim"
- tests/expected/odilo.toml: [[projects]], 2 [[agents]] with project="odilo"

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
…er - Refs terraphim/adf-fleet#7

Introduces ProviderErrorSignatures config schema (throttle/flake regex
lists) alongside a compiled runtime layer that classifies spawned-agent
stderr into Throttle, Flake, or Unknown verdicts. Throttle beats Flake
when both match so a 'rate-limit timeout' line is not treated as retryable.

Adds ProviderBudgetTracker::force_exhaust so Throttle verdicts can push
hour+day windows past their caps, forcing the routing gate to drop the
provider until the next UTC window rolls over.

Patches the three in-tree ProviderBudgetConfig literals (routing.rs + tests)
to carry the new field explicitly rather than relying on Default.
… + drift-detection test - Refs terraphim/adf-fleet#9

Add minimax/ to BANNED_PREFIXES (was missing, causing divergence from Rust
BANNED_PROVIDER_PREFIXES). Add test_banned_list_matches_rust that parses
the Rust source and asserts list equality to prevent future drift.
Add test_minimax_bare_prefix_rejected as regression coverage.

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
…sformation - Refs terraphim/adf-fleet#9

Update fixture orchestrator.toml to use the correct WorkflowConfig schema
(enabled, workflow_file, tracker sub-table). Previous schema had wrong
field names (gitea_base_url, gitea_token) that did not match the Rust
struct, causing adf --check to fail on the generated output.
Regenerate tests/expected/terraphim.toml from the corrected fixture.

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
… banned-list drift tests - Refs terraphim/adf-fleet#9

Add three new tests:
- test_banned_list_matches_rust: parses Rust BANNED_PROVIDER_PREFIXES and
  asserts script BANNED_PREFIXES matches after normalising trailing '/'.
- test_minimax_bare_prefix_rejected: regression coverage for minimax/ ban.
- test_adf_check_accepts_generated_output: runs adf --check as subprocess
  on the temp output of a full migration; asserts exit 0.

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
…n exit path - Refs terraphim/adf-fleet#7

Runs the per-provider classifier on stderr after the existing KG-based
ExitClass match so we cover two blind spots of the code-based check:

  * providers whose CLI exits 0 on quota hits ("returning partial
    output") still trip the breaker and force hour+day budget exhaustion;
  * providers whose CLI emits bespoke error text that ExitClassifier
    doesn't know about are caught by operator-tunable regex lists.

Throttle verdict records a provider failure and calls the new
ProviderBudgetTracker::force_exhaust so the routing gate drops the
provider until the next UTC window rolls.

Flake verdict only logs; dispatch already retries the next pool entry.

Unknown (with real stderr + failure-shaped exit) opens one `[ADF]`
Gitea issue via the OutputPoster's default tracker, deduped in-process
by error_signatures::unknown_dedupe_key so we don't spam fleet-meta
with duplicates for the same stderr shape. Unknown is also counted as
a soft failure so a pathological provider eventually opens the breaker.
…ust coverage - Refs terraphim/adf-fleet#7

Captures realistic stderr fixtures for every subscription-only provider
(claude-code, opencode-go, zai-coding-plan, kimi-for-coding) under
tests/fixtures/stderr/ -- 429 / usage-limit / timeout / EOF / insufficient
balance / quota-exceeded / unknown panic -- and exercises the classifier
end-to-end with the regex lists an operator would ship in
orchestrator.toml.

Covers:
  * per-provider fixture -> expected verdict matrix;
  * throttle beats flake when both patterns match;
  * missing provider in the map falls back to Unknown (fail-safe);
  * line-by-line capture path via classify_lines;
  * dedupe key collapses minor shape variance (case, trailing newline,
    extra detail) so retries don't spam fleet-meta.

Also adds three unit tests for ProviderBudgetTracker::force_exhaust so
the Throttle -> breaker + budget pairing is protected:
  * force_exhaust trips both windows even without recorded cost;
  * force_exhaust is a no-op on uncapped providers (intentional);
  * force_exhaust silently ignores unknown provider ids.

No mocks; every stderr line is captured text from real CLI runs.
AlexMikhalev and others added 29 commits April 25, 2026 19:38
…ection - Refs terraphim/adf-fleet#9

Already bundled in migration script:
- build_project_entry: extracts working_dir/gitea/quickwit/workflow/mentions into [[projects]]
- build_agent_entries: injects project = "<project_id>" into each [[agents]] entry
- build_flow_entries: injects project = "<project_id>" into each [[flows]] entry
- build_base_doc: assembles global settings + include = ["conf.d/*.toml"]
- Idempotent: tomli_w serialises deterministically; running twice is byte-identical

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
…terraphim/adf-fleet#9

validate_models() checks model/fallback_model fields on all [[agents]] and
compound_review. Banned prefixes: opencode/ github-copilot/ google/ huggingface/
Exits non-zero with message: ERROR: Agent 'NAME' uses banned provider 'VALUE'

Co-Authored-By: Terraphim AI <noreply@anthropic.com>
#844)

* feat(drift-detector): agent work [auto-commit]

* feat(security-sentinel): agent work [auto-commit]

* feat(spec-validator): agent work [auto-commit]

* feat(codebase-eval): add manifest types and TOML loader Refs #680

New crate terraphim_codebase_eval with typed manifest models for the
before/after AI-agent codebase evaluation flow:

- HaystackDescriptor, RoleDefinition, QuerySpec, MetricRecord, Thresholds
- EvaluationManifest with validate() for role_id consistency
- load_manifest() auto-detects format by extension (.toml)
- thiserror ManifestError variants: InvalidPath, ParseError, Validation
- 11 unit tests + 1 doc test covering round-trip, validation, edge cases
- Fixture at fixtures/manifest-minimal.toml
- Zero clippy warnings, passes cargo fmt
… Refs #796

- Add opencode-connector and codex-connector feature flags to Cargo.toml

- Implement OpenCodeConnector parsing ~/.local/state/opencode/prompt-history.jsonl

- Implement CodexConnector parsing ~/.codex/sessions/*.jsonl with session_meta and response_item entries

- Register both connectors in ConnectorRegistry behind feature flags

- Add comprehensive unit tests for both connectors with sample JSONL data

- Mirror terraphim-session-analyzer connector patterns for zero drift
- Return empty thesaurus for roles without KG instead of hard error
- Support GITHUB_TOKEN env var for authenticated update checks
- Improve 403 error message to explain rate limiting
- Downgrade startup update check error from stderr to debug log
Refs #921
Refs #914 #915 #916 #917 #918 #919
- Change exit code from 1 to 2 for listen --server rejection
- Error message and stderr output unchanged
…lt Refs #923

- Initialise current_role from service.get_selected_role() in run()
- REPL now shows actual configured role on startup
…fs #892

Wire classify_error() into main error path for offline and server
commands. Split ensure_thesaurus_loaded to error on kg:null (exit 3)
vs degrade gracefully on missing files. Add --fail-on-empty flag to
search subcommand (exit 4). Add integration tests and no_kg fixture.
…efs #936

Update spec checkboxes for Task 1.4 (REPL integration) and Task 1.5
(token budget management) from unchecked to checked. All subtasks
and acceptance criteria verified against codebase with passing tests.

Phase 1-5 disciplined development: research, design, implementation,
verification (22 tests pass), validation (stakeholder approved).
Implements StatusReporter primitive for ADF Phase 1: posts a commit
status to POST /api/v1/repos/{owner}/{repo}/statuses/{sha} with
configurable retry on 5xx/429 (default backoff 1s, 2s, 4s) and no
retry on 4xx. Description is truncated to 140 chars on a Unicode
boundary; target_url=None omits the field entirely.

Tests use a real loopback HTTP server bound via axum on an ephemeral
127.0.0.1 port — no function mocks. Backoff is overridable via
GiteaTracker::with_status_backoff so retry tests run in milliseconds.

Refs #928
After the pr-reviewer is successfully spawned in handle_review_pr,
post a Gitea commit status with state=pending and context=adf/pr-reviewer
against the PR head SHA so the PR's checks gate in the Gitea UI shows
the agent is running. The agent itself transitions the status to
success/failure/error when its task script finishes.

Skip-paths (no pr-reviewer configured, banned provider, budget
exhausted) remain status-silent — posting pending on a check that
never resolves would block merges indefinitely.

Helper post_pr_reviewer_pending_status is best-effort: when no
workflow tracker is configured (e.g. in unit tests) or the API call
fails, we log and return without surfacing the error.

Refs #928
After the structural-pr-review claude invocation completes, parse the
Confidence Score from the agent output and render file, then POST a
Gitea commit status with adf/pr-reviewer context against the PR head
SHA so the PR check gate reflects the verdict:

  4-5/5 -> success
  3/5   -> success (with 'concerns flagged' description)
  0-2/5 -> failure
  parse fail -> error

The status post is best-effort -- a curl failure logs but does not
fail the agent run, since the gtr comment already carries the full
review.

Refs #928
…uild-runner

Implements Phase 3 of the ADF-replaces-Gitea-Actions plan:

- New `WebhookDispatch::Push` variant in webhook.rs carrying project,
  ref_name, before/after SHA, pusher login and the deduplicated union
  of files added/removed/modified across all commits in the payload.
- `GiteaPushPayload`/`GiteaPusher`/`GiteaPushCommit` deserialisation
  structs and a `handle_push_event` axum handler. Detection prefers
  the `X-Gitea-Event: push` header and falls back to JSON shape
  sniffing (`ref` + `before` + `after` + `commits`).
- `aggregate_files_changed` helper preserves first-seen insertion
  order while deduplicating across commits.
- Mirror `DispatchTask::Push` variant in the dispatcher with priority
  350 so deterministic build verdicts race ahead of LLM PR review.
- `handle_push` orchestrator method mirrors `handle_review_pr` shape:
  resolves the project's `build-runner` agent (warn-and-skip when
  absent so repos without one don't break the drain loop), runs the
  subscription allow-list and monthly budget gates, logs an
  observability routing decision row (model = n/a, cost = 0 since
  build-runner is bash, not LLM), then spawns it with `ADF_PUSH_*`
  env injection (SHA, REF, PROJECT, BEFORE_SHA, PUSHER, FILES).
- Wire the new arm into `handle_webhook_dispatch` and the dispatcher
  drain loop alongside ReviewPr/AutoMerge/PostMergeTestGate.
- `CommandKind::Push` plus a normalise arm in control_plane events.

Tests (cargo test -p terraphim_orchestrator --test webhook_tests):
- push_event_parses_correct_shape
- push_event_files_changed_aggregated_from_commits
- push_event_zero_commits_yields_empty_files_changed
- push_event_rejected_without_hmac
- push_event_malformed_payload_returns_200

Fixtures: push_main.json (2 commits, overlapping path sets) and
push_tag_zero_commits.json (tag push edge case).

Refs #929
Adds the bash-only ADF agent template that handle_push spawns on a
Gitea push webhook event. NO LLM, NO model, NO skill_chain -- pure
shell wrapping `rch exec`.

Compute path: rch dispatches to bigbox + SeaweedFS S3 cache, the
same pipeline as .github/workflows/ci-pr.yml (82.83 percent cache
hit rate observed on GitHub Actions run 24929848023).

Cargo step list lifted verbatim from ci-pr.yml:
  cargo fmt --all -- --check
  cargo clippy --workspace --all-targets -- -D warnings
  cargo test --workspace --no-fail-fast

Posts adf/build commit status (pending -> success/failure) directly
to the Gitea Commit Status API. Uses curl rather than the (Phase 1)
set_commit_status Rust wrapper so this agent template stands alone
on the Phase 3 branch.

The toml LANDS in this PR but is NOT yet wired into agents_on_pr_open
-- that wiring is Phase 2's responsibility per the locked rollout
sequencing in .docs/plan-adf-agents-replace-gitea-actions.md section 6.

Refs #929
New binary at crates/terraphim_orchestrator/src/bin/adf-ctl.rs providing
four subcommands via SSH+curl to the webhook endpoint:

- trigger <agent> [--context] [--wait] [--timeout] - fire any agent/persona
- status [--since] - recent spawns and exits from journalctl
- cancel <agent> - locate worktrees and PIDs for manual kill
- agents - list all configured agent names from orchestrator TOML

Key design decisions:
- issue_number=0 in synthetic webhook payload bypasses should_skip_dispatch
- JSON piped via --data-binary @- stdin (avoids shell quoting)
- HMAC-SHA256 computed locally; secret resolved from flag > env > SSH TOML read
- SSH transport via BatchMode=yes; no tunnel management needed
- status/cancel print explicit [best-effort] banner

8 unit tests pass; zero clippy warnings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces a top-level [pr_dispatch] config block listing the agents
that should be dispatched on a Gitea pull_request.opened event. Each
entry pairs an agent name with the Gitea commit-status context the
orchestrator must POST `pending` for after a successful spawn.

When the [pr_dispatch] block is absent, OrchestratorConfig::agents_on_pr_open()
falls back to a legacy default with a single pr-reviewer entry — every
existing deployment continues to work unchanged after Phase 2 lands.

Updates all OrchestratorConfig literal init sites (lib + integration
tests) to set pr_dispatch: None.

Refs #944
Refs adf-phase-2
…F Phase 2)

Refactors handle_review_pr to iterate the agents_on_pr_open list
introduced in the previous commit. Each entry is dispatched through one
of two helpers based on agent name:

  - dispatch_pr_reviewer_for_pr: extracted unchanged from the legacy
    handle_review_pr body. Drives the routing engine, applies the static
    + routed allow-list gates, the per-agent monthly budget gate, then
    spawns with the existing ADF_PR_* env injection.

  - dispatch_build_runner_for_pr: new helper mirroring handle_push.
    Skips the routing engine (build-runner is bash, no LLM, no model)
    and logs a synthetic 'model = n/a' row for dashboard parity. Injects
    ADF_PUSH_* env using refs/pull/<n>/head as the synthetic ref so the
    same task script handles both push events and PR opens.

Each helper returns Ok(true) on successful spawn or Ok(false) when
gated out. The caller posts a 'pending' commit status only when the
helper returns true — a pending from a skipped agent would block the
PR forever.

post_pr_reviewer_pending_status is generalised into post_pending_status
taking context + description as parameters, so the same helper covers
every fan-out entry. The Phase 1 call site passes 'adf/pr-reviewer'
and 'pr-reviewer dispatched' to preserve the existing behaviour.

Tests:
  - handle_review_pr_spawns_both_build_runner_and_pr_reviewer
  - handle_review_pr_injects_per_agent_env_correctly (verifies
    ADF_PUSH_REF=refs/pull/641/head and ADF_PR_NUMBER=641 reach
    the correct child processes)
  - handle_review_pr_skips_missing_agents (no panic, no hung pending)
  - handle_review_pr_pending_status_posted_per_agent (axum loopback,
    asserts two distinct context POSTs both with state=pending)
  - handle_review_pr_skipped_agent_does_not_post_pending (banned model
    on build-runner; asserts adf/build pending NOT posted)

Refs #944
Refs adf-phase-2
…edits

Implement lazy, on-demand cache invalidation for compiled thesauri by storing
a content hash of source Graph markdown files alongside the cached Thesaurus.

When a Thesaurus is loaded, compare the stored hash against a freshly computed
hash of the source files; on mismatch, invalidate the cache and rebuild.
Also expose a manual cache flush CLI subcommand.

New modules:
- terraphim_automata::hash - Graph directory hash computation (twox-hash)
- terraphim_Database::hash_store - Hash Database/retrieval in KV store

Changes:
- Add hash check to TerraphimService::ensure_Thesaurus_loaded()
- Add invalidate_Thesaurus_cache() and flush_Thesaurus_cache()
- Clear #[cached] memoization on persistent cache invalidation
- Add 'cache flush [--role ROLE]' to terraphim-agent CLI and REPL
- Add regression test for stale-cache scenario

Refs #945
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants