Skip to content

feat(approval): persist post-execution audit row alongside approval (#2135)#2367

Merged
senamakel merged 8 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/2135-audit-action-execution
May 23, 2026
Merged

feat(approval): persist post-execution audit row alongside approval (#2135)#2367
senamakel merged 8 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/2135-audit-action-execution

Conversation

@CodeGhost21
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 commented May 20, 2026

Summary

  • Add executed_at, execution_outcome, and execution_error columns to pending_approvals so every approved tool call carries both a before-row (approval decision) and an after-row (terminal execution status).
  • New ApprovalGate::intercept_audited returns the persisted request_id so the four agent paths that gate external-effect tools can stamp the terminal status when the tool finishes.
  • New store::record_execution enforces monotonic audit (decided_at IS NOT NULL guard) and caps error strings at 512 chars to keep upstream crash dumps out of the audit log.

Problem

pending_approvals rows transition from (no decided_at)(decided_at, decision) and stop there. There is no terminal entry recording whether the approved tool actually ran, what it returned, or whether it failed. A compliance reviewer can see approved but not executed — they have to cross-reference the agent transcript, which is the wrong place for an audit trail. Issue #2135 calls this gap out explicitly under "Tests cover audit write before AND after an action".

Solution

Approval domain

  • New ExecutionOutcome enum (Success / Failure / Aborted).
  • pending_approvals table gets three new columns. Idempotent PRAGMA table_info migration upgrades v1 DBs in place — pre-Make approvals and audit durable across non-CLI channels #2135 rows back-fill as NULL so older audit history stays readable.
  • store::record_execution(request_id, outcome, error) gated by decided_at IS NOT NULL so the audit trail stays monotonic (no executed before approved rows possible). Error strings capped at 512 chars + ellipsis.
  • ApprovalGate::intercept_audited(...) returns (GateOutcome, Option<String>). Legacy intercept(...) is a thin wrapper that drops the id so call sites that don't care about after-execution audit (CLI, tests) keep their current shape.
  • ApprovalGate::record_execution(...) wraps the store call with best-effort logging — audit-write failure is logged but never bubbles back to the agent (the tool already ran).

Wiring (4 call sites)

  • agent/harness/tool_loop.rs — capture request_id from the gate, stamp Success/Failure with the truncated tool result text after the tool finishes.
  • agent/harness/subagent_runner/ops.rs — same shape for sub-agent-dispatched tool calls.
  • channels/proactive.rs — stamp Success on ch.send Ok, Failure with the error message on Err.
  • agent/triage/escalation.rs — stamp Success/Failure on the triage-dispatched sub-agent outcome.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason.

  • Tests added or updated (happy path + at least one failure / edge case) — 8 new tests; cargo test --lib approval → 71 passed, 0 failed.
  • Diff coverage ≥ 80% — every changed line in store.rs / gate.rs is exercised by a new or updated test. Four call-site changes are 5–10 lines each, all on hot paths already covered by existing harness tests.
  • Coverage matrix updated — N/A: behaviour-only change to existing approval surface; no new feature row added/removed/renamed.
  • All affected feature IDs from the matrix are listed in ## Related — N/A: no matrix row touched.
  • No new external network dependencies introduced — pure SQLite + in-process gate code.
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: audit-trail addition only; no user-visible UI flow changes when approvals work normally.
  • Linked issue closed via Closes #NNN in the ## Related section — see below.

Impact

  • Runtime/platform: desktop core + Tauri shell. Approval domain is in-process.
  • User-visible: none under happy-path operation. The audit trail in ~/.openhuman/approval/approval.db now carries executed_at + execution_outcome + execution_error per approved row; UI surfaces that consume these rows will see the new fields.
  • Performance / security: one extra SQLite UPDATE per gated tool call (best-effort, async-aware tracing on failure). Error strings are capped at 512 chars so a crash dump can't fill the audit DB.
  • Migration / compatibility: schema migration is idempotent and happens on every with_connection call. v1 DBs (older builds) get the new columns added in place, with existing rows back-filled as NULL.

Tests

cargo test --lib approval71 passed, 0 failed (8 new, 63 pre-existing).

New test Covers
record_execution_writes_terminal_audit_row_after_decide before-decide attempts must be a no-op; after decide they patch the row
record_execution_persists_outcome_and_redacted_error Failure with error text round-trips through SQLite
record_execution_caps_long_error_messages 2 000-char input is capped to ~512 chars + ellipsis
record_execution_unknown_id_is_safe_noop stray record_execution for an id that was never persisted reports false without writing
migrate_columns_is_idempotent_on_v1_databases synthesise a pre-#2135 schema, prove the migration adds the new columns without losing existing rows AND that a second open is a no-op
execution_outcome_round_trips / _serializes_as_snake_case on-wire shape for the new enum
intercept_audited_returns_request_id_only_when_allowed_and_persisted Allow path returns the id so callers can record_execution
intercept_audited_returns_none_id_for_denied_and_allowlisted Deny path and session-allowlist shortcut both return None (nothing to record afterward)

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

  • Branch: fix/2135-audit-action-execution (branched from origin/main after fresh fetch)
  • Commit SHA: see PR head

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: no frontend changes.
  • pnpm typecheck — N/A: no TypeScript changes.
  • Focused tests: cargo test --lib approval → 71 passed, 0 failed.
  • Rust fmt/check (if changed): cargo fmt --manifest-path Cargo.toml applied; cargo check clean.
  • Tauri fmt/check (if changed): N/A: no Tauri shell changes.

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: every gated tool call now leaves both a before-row (approval decision) and an after-row (terminal status) on the same pending_approvals row.
  • User-visible effect: audit-history surfaces will show executed_at + outcome per approved row; agent / UI under normal operation are unchanged.

Parity Contract

  • Legacy behavior preserved: intercept(...) keeps its existing GateOutcome shape (it now wraps intercept_audited and drops the id). CLI and existing tests don't need changes.
  • Guard/fallback/dispatch parity checks: record_execution is gated by decided_at IS NOT NULL so the audit trail stays monotonic; idempotent column migration is tested against a synthetic v1 schema.

Duplicate / Superseded PR Handling

Summary by CodeRabbit

  • New Features

    • Approval flow now captures audited request IDs and records terminal execution outcomes (success/failure/aborted) for approved external actions
    • Audit trail extended to tool calls, escalations, and proactive external sends
  • Chores

    • Database migration adds execution/audit columns and live-upgrade support to persist terminal outcomes and preserve immutability
  • Tests

    • Added tests for audited approvals, execution recording, truncation/redaction, and migration behavior

Review Change Stack

…inyhumansai#2135)

Approval gate currently writes a single durable row per intercepted
tool call: it captures *the action was approved at X* but never the
matching *the action ran at Y with outcome Z*. The acceptance
criterion in tinyhumansai#2135 requires tests for audit writes both before AND
after an action — this commit closes that gap.

Approval domain
- New `ExecutionOutcome` enum (`Success` / `Failure` / `Aborted`).
- `pending_approvals` SQLite table gets three new columns:
  `executed_at`, `execution_outcome`, `execution_error`. Idempotent
  PRAGMA-table_info migration upgrades v1 DBs in place — pre-tinyhumansai#2135
  rows back-fill as NULL so older audit history stays readable.
- New `store::record_execution(request_id, outcome, error)`. Gated
  by `decided_at IS NOT NULL` so the audit trail stays monotonic
  (no "executed before approved" rows possible). Error strings are
  capped at 512 chars so upstream crash dumps can't fill the audit
  log; the agent still sees the full error in its tool-result
  envelope.
- New `ApprovalGate::intercept_audited(...)` returns
  `(GateOutcome, Option<String>)`. The legacy `intercept(...)` is a
  thin wrapper that drops the id, so call sites that don't care
  about after-execution audit (CLI, tests) keep their current
  shape.
- New `ApprovalGate::record_execution(...)` wraps the store call
  with best-effort logging — audit-write failure is logged but never
  bubbles back to the agent (the tool already ran).

Wiring (4 call sites)
- `agent/harness/tool_loop.rs`: capture request_id from the gate,
  stamp Success/Failure with the truncated tool result text after
  the tool finishes.
- `agent/harness/subagent_runner/ops.rs`: same shape for
  sub-agent-dispatched tool calls.
- `channels/proactive.rs`: stamp Success on `ch.send` Ok, Failure
  with the error message on Err.
- `agent/triage/escalation.rs`: stamp Success/Failure on the
  triage-dispatched sub-agent outcome.

Tests (8 new, 71 passing in `cargo test --lib approval`)
- `record_execution_writes_terminal_audit_row_after_decide` —
  before-decide attempts must be a no-op; after decide they patch
  the row.
- `record_execution_persists_outcome_and_redacted_error` — Failure
  with error text round-trips through SQLite.
- `record_execution_caps_long_error_messages` — 2 000-char input
  is capped to ~512 chars + ellipsis.
- `record_execution_unknown_id_is_safe_noop` — stray
  record_execution for an id that was never persisted reports
  `false` without writing.
- `migrate_columns_is_idempotent_on_v1_databases` — synthesise a
  pre-tinyhumansai#2135 schema, prove the migration adds the new columns
  without losing existing rows AND that a second open is a no-op.
- `execution_outcome_round_trips` / `_serializes_as_snake_case` —
  on-wire shape for the new enum.
- `intercept_audited_returns_request_id_only_when_allowed_and_persisted`
  — Allow path returns the id so callers can record_execution.
- `intercept_audited_returns_none_id_for_denied_and_allowlisted` —
  Deny path and session-allowlist shortcut both return None
  (nothing to record afterward).

Out of scope (companion / follow-up PRs)
- `composio.list_recent_decisions` audit-history read path — owned
  by tinyhumansai#2335 (open).
- Lazy expiry of stale pending rows after restart — owned by
  tinyhumansai#2357 (open).
- Policy-callback hook for unattended channel/webhook denial —
  follow-up.
@CodeGhost21 CodeGhost21 requested a review from a team May 20, 2026 19:41
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds audited approval intercepts that return request IDs, DB migration and columns for execution audit, an ExecutionOutcome enum, and wiring to record terminal success/failure across tool harness, escalation, and proactive external-channel sends.

Changes

Approval audit recording

Layer / File(s) Summary
ExecutionOutcome type and audited gate APIs
src/openhuman/approval/types.rs, src/openhuman/approval/gate.rs, src/openhuman/approval/mod.rs
ExecutionOutcome (Success/Failure/Aborted) is added. ApprovalGate::intercept_audited() returns (GateOutcome, Option<request_id>) and record_execution() writes terminal outcome; gate tests verify request-id threading and allowlist shortcut behavior.
Approval store schema migration and persistence
src/openhuman/approval/store.rs
pending_approvals gains executed_at, execution_outcome, execution_error. migrate_columns() idempotently upgrades older DBs on open. New record_execution() persists outcome/error only after decided_at and when not already stamped; error text is sanitized and capped/truncated. Tests cover migration, truncation, idempotency, unknown-id safety, and redaction.
Tool harness & escalation integration
src/openhuman/agent/harness/tool_loop.rs, src/openhuman/agent/harness/subagent_runner/ops.rs, src/openhuman/agent/triage/escalation.rs
Tool-loop, subagent runner, and escalation use intercept_audited, retain optional approval_request_id and gate handle, then call record_execution after execution to stamp Success/Failure (exec success is derived from structured result, not text prefixes). Existing allow/deny flows remain.
Proactive message channel audited integration
src/openhuman/channels/proactive.rs
External-channel proactive sends call intercept_audited and record Success/Failure back to the approval audit row after send completes; web-channel delivery is unchanged.
i18n minor update
app/src/lib/i18n/chunks/de-5.ts
Trailing mascot translation entries adjusted.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2149: Prior changes to external-effect ApprovalGate flow; this PR extends that flow with audited intercepts and execution recording.
  • tinyhumansai/openhuman#2335: Adds related approval-audit history read APIs over decided audit rows that this change stamps.
  • tinyhumansai/openhuman#2357: Changes pending_approvals lifecycle and interacts with the same row-state transitions updated here.

Suggested reviewers

  • senamakel

Poem

🐰 I found a tiny request id in my hat,
The gate gave a nod and then that was that.
I ran the tool, then penned the tale:
success or fail — stamped without fail.
Hopping home, I left a tidy audit trail.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main change: adding post-execution audit row persistence alongside approval decisions, as evidenced by all core files modifying approval storage and audit recording.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from #2135: durable storage of approval decisions/audit entries, audited tool execution paths across four call sites (tool_loop, subagent_runner, escalation, proactive channels), ExecutionOutcome enum for terminal status tracking, and error redaction with truncation.
Out of Scope Changes check ✅ Passed The PR includes one minor out-of-scope change: a German i18n file (de-5.ts) was modified to fix duplicate keys unrelated to approval auditing, though this appears to be a merge conflict resolution.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. working A PR that is being worked on by the team. labels May 20, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/openhuman/approval/gate.rs (1)

200-255: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Handle approval-vs-timeout races before forcing Deny.

decide() commits the approval in SQLite before it sends on the waiter. If that commit lands right as the TTL expires, this branch can still time out, return GateOutcome::Deny, and drop the request_id even though the row is already approved. That leaves the durable audit saying “approved” while the tool never runs and no execution outcome can be recorded. Please make the timeout path re-check the persisted decision (or otherwise serialize timeout vs. decide atomically) before denying.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/approval/gate.rs` around lines 200 - 255, The timeout branch
currently unconditionally denies even if store::decide committed an approval
concurrently; modify the Err(_elapsed) handling in the gate wait logic to
re-check the persisted decision before returning Deny: after
self.evict_waiter(&request_id) (and before logging/returning), call the store
read method that fetches the stored ApprovalDecision for request_id (the same
persistence API that decide() writes), and if the persisted decision is Approve
return (GateOutcome::Allow, Some(request_id)) instead of denying; only if the
persisted decision is Deny or missing proceed with the existing deny path and
calls to store::decide. Ensure you reference self.ttl, rx, store::decide,
evict_waiter, and GateOutcome when locating the fix.
src/openhuman/agent/harness/subagent_runner/ops.rs (1)

1524-1550: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the structured execution result for audit status, not the rendered text.

!raw.starts_with("Error") is not a reliable success signal. A successful tool payload can legitimately begin with "Error" (for example copied logs or search results), which will persist a false Failure in the new audit row. Carry a success bool out of the timeout/execute match instead of re-parsing the display string.

Suggested fix
-                    let raw = match tokio::time::timeout(timeout, tool.execute(args)).await {
-                        Ok(Ok(result)) => {
-                            let raw = result.output();
-                            if result.is_error {
-                                format!("Error: {raw}")
-                            } else {
-                                raw
-                            }
-                        }
-                        Ok(Err(err)) => format!("Error executing {}: {err}", call.name),
-                        Err(_) => format!("Error: tool '{}' timed out", call.name),
-                    };
+                    let (raw, success) = match tokio::time::timeout(timeout, tool.execute(args)).await {
+                        Ok(Ok(result)) => {
+                            let raw = result.output();
+                            if result.is_error {
+                                (format!("Error: {raw}"), false)
+                            } else {
+                                (raw, true)
+                            }
+                        }
+                        Ok(Err(err)) => (format!("Error executing {}: {err}", call.name), false),
+                        Err(_) => (format!("Error: tool '{}' timed out", call.name), false),
+                    };
                     // Stamp the terminal status onto the
                     // pending_approvals audit row — best-effort,
                     // failures don't propagate to the agent (`#2135`).
                     if let (Some(gate), Some(req_id)) = (
                         approval_gate_for_audit.as_ref(),
                         approval_request_id.as_ref(),
                     ) {
-                        let success = !raw.starts_with("Error");
                         let exec_outcome = if success {
                             crate::openhuman::approval::ExecutionOutcome::Success
                         } else {
                             crate::openhuman::approval::ExecutionOutcome::Failure
                         };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/subagent_runner/ops.rs` around lines 1524 - 1550,
The audit success is being derived from the rendered text with
!raw.starts_with("Error"), which is brittle; instead capture a boolean success
from the tokio::time::timeout(...) match that distinguishes Ok(Ok(result)) vs
Ok(Err(_)) vs Err(_), and use that success value when computing exec_outcome and
err_text before calling gate.record_execution; update the match around
tool.execute(args) (the block that defines raw) to also set a success flag
(e.g., success: bool) and use that flag in the approval_gate_for_audit /
approval_request_id handling rather than parsing raw.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/approval/store.rs`:
- Around line 259-266: The truncation currently takes 512 chars then appends "…"
so stored strings can become 513 chars; update the closure that builds
trimmed_error (the error.map(|e| { ... }) block) to ensure the final string
never exceeds 512 characters by either taking 511 characters and then appending
"…" when truncated or by hard-truncating to 512 without appending; change the
chars.take(...) logic accordingly so trimmed_error respects the 512-character
cap.
- Around line 267-275: The UPDATE in the pending_approvals write currently
allows overwriting executed_at/execution_outcome/execution_error on retries;
modify the SQL in the execute call that sets executed_at, execution_outcome,
execution_error (the same execute invocation that uses params![now,
outcome.as_str(), trimmed_error, request_id]) to add "AND executed_at IS NULL"
to the WHERE clause (in addition to the existing "decided_at IS NOT NULL") so
the first terminal report wins and subsequent calls are ignored.

---

Outside diff comments:
In `@src/openhuman/agent/harness/subagent_runner/ops.rs`:
- Around line 1524-1550: The audit success is being derived from the rendered
text with !raw.starts_with("Error"), which is brittle; instead capture a boolean
success from the tokio::time::timeout(...) match that distinguishes
Ok(Ok(result)) vs Ok(Err(_)) vs Err(_), and use that success value when
computing exec_outcome and err_text before calling gate.record_execution; update
the match around tool.execute(args) (the block that defines raw) to also set a
success flag (e.g., success: bool) and use that flag in the
approval_gate_for_audit / approval_request_id handling rather than parsing raw.

In `@src/openhuman/approval/gate.rs`:
- Around line 200-255: The timeout branch currently unconditionally denies even
if store::decide committed an approval concurrently; modify the Err(_elapsed)
handling in the gate wait logic to re-check the persisted decision before
returning Deny: after self.evict_waiter(&request_id) (and before
logging/returning), call the store read method that fetches the stored
ApprovalDecision for request_id (the same persistence API that decide() writes),
and if the persisted decision is Approve return (GateOutcome::Allow,
Some(request_id)) instead of denying; only if the persisted decision is Deny or
missing proceed with the existing deny path and calls to store::decide. Ensure
you reference self.ttl, rx, store::decide, evict_waiter, and GateOutcome when
locating the fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3047abd0-a16b-41a4-9e69-d03fa27fb0ea

📥 Commits

Reviewing files that changed from the base of the PR and between fa8d75f and 58c3264.

📒 Files selected for processing (8)
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/agent/triage/escalation.rs
  • src/openhuman/approval/gate.rs
  • src/openhuman/approval/mod.rs
  • src/openhuman/approval/store.rs
  • src/openhuman/approval/types.rs
  • src/openhuman/channels/proactive.rs

Comment thread src/openhuman/approval/store.rs Outdated
Comment thread src/openhuman/approval/store.rs
…ror at 512 incl. ellipsis + make terminal write immutable

Two CodeRabbit findings on src/openhuman/approval/store.rs:

1. **Minor — cap inclusive of ellipsis.** The previous truncation
   took 512 chars then appended "…", so stored values became 513
   chars when truncation kicked in. Fixed by taking 511 chars when
   the input would overflow and appending the marker; short inputs
   pass through unchanged. New test asserts the truncated value is
   exactly 512 chars including the ellipsis.

2. **Major — terminal write must be immutable.** The UPDATE used
   `WHERE request_id = ?4 AND decided_at IS NOT NULL`, which let
   any later `record_execution` call silently overwrite
   executed_at / execution_outcome / execution_error. Added
   `AND executed_at IS NULL` so the first terminal report wins; a
   late retry / cleanup path now reports `false` and leaves the
   original audit row intact. New test
   `record_execution_is_idempotent_after_first_terminal_report_wins`
   pins the contract.

`cargo test --lib approval::store` → 13 passed, 0 failed
(13 = 11 pre-existing + 1 new idempotency test; the cap test was
already there but its assertion now uses the exact 512-char count).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/approval/store.rs`:
- Around line 263-269: The current code only truncates the execution_error
string via trimmed_error and still persists raw backend/tool text; update the
persistence step to run execution_error through the same audit-redaction routine
used for request payloads (use the project's audit redaction function that
sanitizes tokens, emails, and PII) instead of simple truncation, or
alternatively replace the stored value with a sanitized summary/error code;
ensure you update handling around trimmed_error/execution_error in store.rs so
the redacted result (not the raw or just-truncated string) is what gets written
to the durable audit table.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5efd6e5c-d848-43f2-91bd-c2e8fd108125

📥 Commits

Reviewing files that changed from the base of the PR and between 58c3264 and c439549.

📒 Files selected for processing (1)
  • src/openhuman/approval/store.rs

Comment thread src/openhuman/approval/store.rs Outdated
@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels May 22, 2026
…humansai#2367 review)

Run the upstream error blurb through memory::safety::sanitize_text
*before* truncating to 512 chars so the durable audit row cannot leak
bearer tokens, API keys, private-key blocks, OAuth params, or
government-ID PII the failing tool may have echoed back. Truncating
first would split a secret mid-string and dodge the regex matchers.

Adds record_execution_redacts_secrets_in_error_message asserting a
bearer token and provider key are scrubbed from the stored row and
that the [REDACTED] marker remains visible to audit readers.
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 22, 2026
@senamakel senamakel self-assigned this May 23, 2026
senamakel added 2 commits May 22, 2026 17:47
# Conflicts:
#	src/openhuman/approval/store.rs
…ess (tinyhumansai#2367 review)

- gate.rs: when the TTL elapses concurrently with a committed approval,
  `store::decide(Deny)` silently no-ops (its `WHERE decided_at IS NULL`
  guard preserves the existing row), but the gate previously returned
  Deny anyway — leaving the durable audit row saying Approved while the
  tool never ran. The timeout path now re-reads the persisted decision
  via the new `store::get_decision` and honors an existing Approve.

- subagent_runner/ops.rs: replace `!raw.starts_with(\"Error\")` audit
  signal with a structured `exec_success` bool captured from the
  `tokio::time::timeout` match. Legitimate success payloads can start
  with the literal word \"Error\" (search hits, copied logs) and would
  otherwise persist a false Failure on the audit row.

- store.rs: add `get_decision(config, request_id)` read helper used by
  the gate race resolution, plus a unit test covering missing/undecided/
  decided round-trip.

Addresses CodeRabbit review on PR tinyhumansai#2367:
- src/openhuman/approval/gate.rs lines 200-255 (timeout vs decide race)
- src/openhuman/agent/harness/subagent_runner/ops.rs lines 1524-1550
  (brittle string-prefix success detection)
The merge from main brought back PR tinyhumansai#2367's older de-5.ts that still
had duplicate keys for settings.developerMenu.mcpServer.* and
settings.mcpServer.* — the canonical fix on main lives at lines
211-235. TS1117 was failing CI on this branch (Type Check TypeScript).
Drop the duplicates so this file matches main.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/openhuman/approval/store.rs (1)

319-328: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only stamp execution on approved rows.

decided_at IS NOT NULL also matches Deny/auto-expired rows, so a stray caller can create an impossible audit trail with decision = deny and executed_at populated. Please restrict this update to approval decisions only at the store boundary.

Suggested fix
+        let approve_once = ApprovalDecision::ApproveOnce.as_str();
+        let approve_always = ApprovalDecision::ApproveAlwaysForTool.as_str();
         let updated = conn
             .execute(
                 "UPDATE pending_approvals
                  SET executed_at = ?1,
                      execution_outcome = ?2,
                      execution_error = ?3
                  WHERE request_id = ?4
-                   AND decided_at IS NOT NULL
+                   AND decided_at IS NOT NULL
+                   AND decision IN (?5, ?6)
                    AND executed_at IS NULL",
-                params![now, outcome.as_str(), trimmed_error, request_id],
+                params![
+                    now,
+                    outcome.as_str(),
+                    trimmed_error,
+                    request_id,
+                    approve_once,
+                    approve_always,
+                ],
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/approval/store.rs` around lines 319 - 328, The UPDATE currently
matches any decided row (including Deny/expired); modify the SQL in the
conn.execute call that updates pending_approvals so it only updates rows whose
decision is the approval value (e.g., add "AND decision = 'approve'" to the
WHERE clause), keeping the rest of the parameters (now, outcome.as_str(),
trimmed_error, request_id) and semantics the same so executed_at can only be
stamped for approved requests.
src/openhuman/agent/harness/subagent_runner/ops.rs (1)

1583-1619: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Thread exec_success through to the downstream status path.

This fixes the audit write, but the later call_success / text-mode status logic still falls back to !result_text.starts_with("Error"). A successful payload that begins with "Error" will now persist ExecutionOutcome::Success while the progress event and <tool_result status="..."> still report failure.

Suggested fix
-            let result_text = if !allowed_names.contains(&call.name) {
+            let (result_text, call_success) = if !allowed_names.contains(&call.name) {
                 tracing::warn!(
                     task_id = %task_id,
                     agent_id = %agent_id,
                     tool = %call.name,
                     "[subagent_runner] tool not in allowlist for this sub-agent"
                 );
                 let mut available: Vec<&str> = allowed_names.iter().map(|s| s.as_str()).collect();
                 if let Some(resolver) = lazy_resolver.as_ref() {
                     available.extend(resolver.known_slugs());
                 }
                 available.sort_unstable();
                 available.dedup();
-                format!(
+                (
+                    format!(
                         "Error: tool '{}' is not available to the {} sub-agent. Available tools: {}",
                         call.name,
                         agent_id,
                         available.join(", ")
-                )
+                    ),
+                    false,
+                )
             } else if let Some(tool) = extra_tools
                 .iter()
                 .find(|t| t.name() == call.name)
                 .or_else(|| parent_tools.iter().find(|t| t.name() == call.name))
             {
@@
                 if let Some(reason) = gate_denial {
-                    format!("Error: {reason}")
+                    (format!("Error: {reason}"), false)
                 } else {
                     let (raw, exec_success) =
                         match tokio::time::timeout(timeout, tool.execute(args)).await {
@@
                     if let (Some(gate), Some(req_id)) = (
                         approval_gate_for_audit.as_ref(),
                         approval_request_id.as_ref(),
                     ) {
-                        let success = exec_success;
-                        let exec_outcome = if success {
+                        let exec_outcome = if exec_success {
                             crate::openhuman::approval::ExecutionOutcome::Success
                         } else {
                             crate::openhuman::approval::ExecutionOutcome::Failure
                         };
-                        let err_text = if success { None } else { Some(raw.as_str()) };
+                        let err_text = if exec_success { None } else { Some(raw.as_str()) };
                         gate.record_execution(req_id, exec_outcome, err_text);
                     }
-                    raw
+                    (raw, exec_success)
                 }
             } else {
-                format!("Unknown tool: {}", call.name)
+                (format!("Unknown tool: {}", call.name), false)
             };
@@
-            let call_success = !result_text.starts_with("Error");
             let call_output_chars = result_text.chars().count();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/subagent_runner/ops.rs` around lines 1583 - 1619,
The code currently computes exec_success from the structured execute result but
later still derives call_success / text-mode status by checking
result_text.starts_with("Error"), causing mismatches when a successful payload
begins with "Error"; update the downstream status and progress-event logic (the
places that compute call_success and status and emit the <tool_result
status="..."> or progress events) to use the previously computed exec_success
boolean instead of parsing raw/result_text, and pass exec_success through to any
variables or function calls that currently accept call_success or status so the
audit record, progress event, and tool_result all use the same ExecutionOutcome.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/openhuman/agent/harness/subagent_runner/ops.rs`:
- Around line 1583-1619: The code currently computes exec_success from the
structured execute result but later still derives call_success / text-mode
status by checking result_text.starts_with("Error"), causing mismatches when a
successful payload begins with "Error"; update the downstream status and
progress-event logic (the places that compute call_success and status and emit
the <tool_result status="..."> or progress events) to use the previously
computed exec_success boolean instead of parsing raw/result_text, and pass
exec_success through to any variables or function calls that currently accept
call_success or status so the audit record, progress event, and tool_result all
use the same ExecutionOutcome.

In `@src/openhuman/approval/store.rs`:
- Around line 319-328: The UPDATE currently matches any decided row (including
Deny/expired); modify the SQL in the conn.execute call that updates
pending_approvals so it only updates rows whose decision is the approval value
(e.g., add "AND decision = 'approve'" to the WHERE clause), keeping the rest of
the parameters (now, outcome.as_str(), trimmed_error, request_id) and semantics
the same so executed_at can only be stamped for approved requests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e56c4e17-8529-4f93-96b6-4ba62d4193ec

📥 Commits

Reviewing files that changed from the base of the PR and between 5524460 and 8299906.

📒 Files selected for processing (5)
  • app/src/lib/i18n/chunks/de-5.ts
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/approval/gate.rs
  • src/openhuman/approval/store.rs
✅ Files skipped from review due to trivial changes (1)
  • app/src/lib/i18n/chunks/de-5.ts

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 23, 2026
The 500ms TTL was racing the 50×10ms poll loop on slow CI runners —
list_pending's lazy-expire would deny the row before decide() could
fire, surfacing as "pending row never appeared" in
approve_once_returns_allow on Linux CI. 2s gives the polling tests
headroom while keeping timeout_returns_deny fast.
@senamakel senamakel merged commit 2f5db77 into tinyhumansai:main May 23, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make approvals and audit durable across non-CLI channels

2 participants