Agent governance: policy enforcement vs decision evidence #276

xsa520 · 2026-03-16T07:50:16Z

xsa520
Mar 16, 2026

I've been exploring governance layers for AI agents.

Many current approaches focus on:

• policy enforcement
• runtime sandboxing
• audit logs

However I'm curious whether governance should also treat
the decision itself as a first-class artifact.

For example:

Intent → Policy → Decision → Evidence → Execution

Where the decision and evidence are sealed and replayable.

Curious whether the toolkit models governance decisions
in this way or focuses mainly on enforcement.

Experiment repo:
https://github.com/xsa520/guardian

imran-siddique · 2026-03-16T16:25:05Z

imran-siddique
Mar 16, 2026
Collaborator

@xsa520 — good question, and the answer is yes, partially.

The toolkit already models decisions as structured artifacts:

PolicyDecision (Agent OS) and PolicyDecision (AgentMesh) both include: allowed, matched_rule, action, reason, evaluation_ms, and an audit_entry dict with a context snapshot and timestamp
The AuditChain in AgentMesh uses hash-chained entries (append-only, tamper-evident), so the sequence of decisions is verifiable
The flight recorder captures full execution traces for replay

What we don't do yet is treat the decision as a sealed, independently verifiable artifact in the way you describe. Your Intent -> Policy -> Decision -> Evidence -> Execution pipeline is a useful framing — right now the decision and evidence are embedded in the audit trail rather than being first-class objects you can pass around or verify externally.

The closest thing to what you're describing is the AuthorityDecision model from our recently merged reputation-gated authority proposal — it includes the decision, effective scope, narrowing reason, and trust tier, which makes it somewhat self-describing.

Would be interested to see how your guardian repo approaches the sealed evidence part. Is it cryptographic (signed decisions) or structural (decision logs with integrity chains)?

4 replies

xsa520 Mar 17, 2026
Author

Thanks for the detailed explanation — that helps clarify how the toolkit models decisions.

What I was exploring with Guardian is slightly different from traditional policy enforcement logs.

Instead of treating the decision as part of the audit trail, the idea is to treat the decision itself as a first-class artifact.

The flow becomes:

Intent → Policy → Decision → Evidence → Execution

Where the decision record includes the evaluated intent, policy outcome, and metadata, and is then written into a hash-chained evidence ledger.

That allows the system to:

• replay decisions deterministically
• verify ledger integrity
• detect tampering between policy evaluation and execution

So the governance object isn't just "policy enforcement", but a verifiable decision artifact that can be replayed later.

The Guardian prototype is exploring this direction here:
https://github.com/xsa520/guardian

Still early, but curious whether modeling the decision as a sealed artifact might complement the toolkit’s current audit model.

tomjwxf Apr 17, 2026

@xsa520 strong framing, treating the decision itself (not just the policy or the execution) as a first-class, sealed artifact is exactly the gap between "policy engine" and "accountability engine."

We've been building this as ScopeBlind (MIT, now integrated into AGT via the protect-mcp adapter merged in #667). The Intent → Policy → Decision → Evidence → Execution sequence maps to:

Intent : MCP tool call input
Policy : Cedar rule (AGT GovernanceGate)
Decision : {allow|deny|request_approval}
Evidence : Ed25519-signed canonical JSON receipt chained via sha256(prev)
Execution : enforced by the same gate, before the tool fires
The Evidence layer is what we call a receipt — a canonical-JSON record of (intent, policy_id, decision, sequence, prev_hash, timestamp, issuer_nullifier), signed before execution and appended to a tamper-evident chain. Any third party can verify offline with npx @veritasacta/verify; no vendor dependency. The issuer_nullifier uses a VOPRF construction so the verifier learns validity without learning issuer identity — useful when the auditor is also a potential adversary.

It's been submitted as an IETF Internet-Draft (draft-farley-acta-signed-receipts) and anchored to Sigstore Rekor via the in-toto predicate type we contributed upstream. Docs: scopeblind.com/docs/protocol. Happy to write up a Tutorial 32 walkthrough for AGT if that'd be useful.

Followed you too, let me know if if want to connect deeper on this as deep focus of mine right now!

mj3b Apr 24, 2026

imran-siddique's response to xsa520 names the gap precisely: decision and evidence are embedded in the audit trail rather than being first-class objects you can pass around or verify externally. That is the gap I built into.

The claim is narrow. AGT's PolicyDecision answers whether an action was permitted. It does not answer what category of decision the agent was making, what was observable about that decision at classification time, or what level of human deliberation the institution requires before that action proceeds. Those are different questions, and the artifact that answers them needs to exist before execution, not as part of the audit trail, but prior to it.

I've published a specification and working implementation:
https://github.com/mj3b/governed-decision-intelligence

Implementation Overview

The implementation hooks into BaseIntegration.pre_execute() without modifying AGT's policy logic. After POLICY_CHECK fires, it writes a GateRecord containing three elements:

1. Gate Classification

Four institutional categories applied after pre_execute() returns, before execution proceeds
Classification is defined by institutional rules, not generated by the model being governed
A model classifying its own decisions is producing self-assessment
Self-assessment is not governance

2. Decision Context

Structured capture of what the governance layer could observe at classification time:
- Intended action
- Declared task (if available from agent state)
- Confidence distribution across alternatives
The epistemic boundary is explicit in the code
No claim about model internals, only what was observable at the surface

3. Reasoning Reconstruction

Deterministic plain-language summary derived from structured fields only
No model call
Same inputs always produce the same output
Satisfies ARAF v3.0's reconstructability requirement without post-hoc inference

Integrity Model

xsa520's guardian uses an evidence ledger approach — structural integrity through append-only records.

GDI uses SHA-256 on core immutable fields computed at classification time, which adds tamper detection at the individual record level:

Any post-hoc modification produces a hash mismatch
The record is written before execution

An artifact written after the fact is a narrative.
An artifact written before execution is evidence.

Validation

114 tests, all passing
Insurance claims worked example runs standalone without AGT installed
Demonstrates all four gate levels, including a Gate 4 execution halt

Questions for Maintainers

Does this fit within AGT's intended scope?
Is there a contribution path that makes sense, whether as:
- An integration example in examples/
- A complementary external project

tomjwxf Apr 26, 2026

@mj3b the GateRecord pattern (SHA-256 pre-execution hash, four institutional categories, tamper detection) maps well to what ships as commitment-mode receipts in the acta format. The main addition: Ed25519 signatures and JCS canonicalization (RFC 8785) so verification is cryptographic, not just structural. Any third party can verify offline with npx @veritasacta/verify.

"An artifact written before execution is evidence" is exactly the commitment-mode design in draft-farley-acta-signed-receipts: the receipt is signed before the tool fires, making it impossible to backdate or fabricate.

If you want to test interop, ScopeBlind/agent-governance-testvectors already has the fixture structure, conformance runner, and CI. Adding a GDI driver would be a one-PR change.

xsa520 · 2026-03-17T11:59:05Z

xsa520
Mar 17, 2026
Author

We opened a discussion here to explore this further:

xsa520/guardian#2

The issue summarizes two governance evidence models:

• execution-receipt-centric governance
• decision-artifact-centric governance

Interested to hear perspectives from other governance implementations.

0 replies

Gingiris · 2026-04-01T04:08:28Z

Gingiris
Apr 1, 2026

Great framing! The Intent → Policy → Decision → Evidence → Execution pipeline is a solid model.

From an open source community perspective, consider:

Community governance policies — As your toolkit grows, who decides what policies to include? A governance framework for the governance framework.
Decision evidence as community knowledge — Making policy decisions and their evidence public could help users understand why certain restrictions exist, building trust.
Policy marketplace potential — If policies become shareable artifacts (like clip.json for Pinix), you could have a governance policy registry where organizations share security/compliance policies.

The guardian experiment looks interesting — looking forward to seeing how it evolves!

1 reply

xsa520 Apr 1, 2026
Author

Appreciate this — especially the framing around policy as shareable artifacts.

I’ve been thinking of decision evidence less as a governance output and more as something that can become a reusable reference across systems, so the “community knowledge” angle resonates.

The question of who defines policy (and under what authority) feels like it naturally sits a layer above — curious how others are thinking about that boundary as these systems evolve.

Gingiris · 2026-04-01T04:40:02Z

Gingiris
Apr 1, 2026

Regarding who defines policy and under what authority — I think the key insight is that AI governance tools need to be opinionated about mechanisms but flexible about authority structures.

Three models that work in practice:

Standards body (W3C style) — formal authority, slow consensus
Core team governance (Linux kernel) — fast, single point of authority
Modular RBAC (Kubernetes-style) — the toolkit defines the mechanism, the org decides who holds the keys

For your Guardian project, I'd suggest focusing on the modular approach: make it easy to plug in different authority models as the community matures.

1 reply

xsa520 Apr 1, 2026
Author

This is a really helpful breakdown — especially the distinction between mechanism and authority.

The modular approach makes sense from a system design perspective. One thing I keep coming back to is that regardless of which model is used (standards body, core team, or modular), there still needs to be a way for different systems to recognize and accept decisions made under different authority models.

Curious how you think about that layer — not just how authority is structured internally, but how it’s interpreted or accepted externally when systems interact.

simonhsze · 2026-04-05T16:38:36Z

simonhsze
Apr 5, 2026

@imran-siddique your point about decisions being embedded in the audit trail rather than treated as first-class artifacts got me thinking, and @xsa520's sealed-decision direction made the gap more concrete.

I've been exploring whether one useful form of pre-execution decision evidence could be an adversarial review artifact: challenger raises concerns, defender responds, judge renders a verdict with a score delta.

Ran a small case study on that question here: https://github.com/AgentPolis/agent-constitution/blob/main/docs/case-studies/pre-execution-review-vs-post-audit.md

0 replies

jlugo63 · 2026-04-05T18:59:00Z

jlugo63
Apr 5, 2026

Really interesting thread. Building on xsa520’s decision-artifact framing — we ran into the same problem and ended up taking a similar direction.

We built what we call a governance chain: a hash-linked sequence of audit events that captures the full lifecycle of a governance decision, from initial intent through policy evaluation, review, approval, and execution evidence. The key difference from a traditional audit log is that the chain itself is the artifact — it’s portable, independently verifiable, and self-contained.

Here’s a simplified example of what an exported governance artifact looks like:

{
  "artifact_version": "1.0",
  "chain_id": "c-3a2d7990",
  "status": "APPROVED",
  "integrity": true,
  "events": [
    {
      "event_type": "INBOUND_INTENT",
      "actor_id": "agent:code-analyzer",
      "role_used": "proposer",
      "payload": { "goal": "Fix race condition", "action_type": "CODE_DEPLOY" },
      "prev_hash": "8f4a2b...",
      "event_hash": "c6280b..."
    },
    {
      "event_type": "POLICY_EVAL",
      "actor_id": "system:policy-engine",
      "role_used": "system",
      "payload": { "risk_score": 0.75, "tier": "AUTONOMOUS" },
      "prev_hash": "c6280b...",
      "event_hash": "3d4bb8..."
    },
    {
      "event_type": "REVIEW_ATTESTATION",
      "actor_id": "agent:code-reviewer",
      "role_used": "reviewer",
      "payload": { "decision": "ATTEST", "rationale": "Minimal diff, sound fix" },
      "prev_hash": "3d4bb8...",
      "event_hash": "a0353c..."
    },
    {
      "event_type": "APPROVAL_GRANTED",
      "actor_id": "agent:deploy-authority",
      "role_used": "approver",
      "payload": { "rationale": "Evidence passed, review attested" },
      "prev_hash": "a0353c...",
      "event_hash": "d447ad..."
    }
  ],
  "roster": {
    "agent:code-analyzer": ["proposer"],
    "agent:code-reviewer": ["reviewer"],
    "agent:deploy-authority": ["approver"]
  },
  "event_count": 4,
  "genesis_hash": "8f4a2b..."
}

Each event’s hash includes the previous hash, which makes the sequence tamper-evident. You can’t reorder events, insert approvals, or remove violations without breaking the chain. In that sense, the artifact doesn’t reference an audit log — it is the verifiable decision record.

Separation of powers is also embedded directly in the artifact. The chain captures who performed each step — proposer, policy engine, reviewer, approver — ensuring that elevated actions can’t be self-approved. This is structurally enforced rather than relying solely on policy.

We also support offline verification. Given just the artifact, a verifier can re-walk the chain and confirm integrity and role separation without access to the original system.

Finally, we include execution evidence as part of the chain, so the artifact reflects not only whether something was approved, but what actually happened at execution time.

One area I’m still exploring is cross-system artifact exchange. If one system produces a governance artifact and another needs to trust it, what should that interface look like? Is a shared schema necessary, or is publishing the hashing and verification model sufficient while allowing flexibility in event types? Curious how others are thinking about interoperability at the artifact level as more governance frameworks emerge.

0 replies

xsa520 · 2026-04-05T19:11:34Z

xsa520
Apr 5, 2026
Author

This is a very clean articulation of the governance chain as a portable artifact — especially the idea that the chain itself becomes independently verifiable.

One question this raises when thinking about cross-system exchange:

If two systems produce structurally valid artifacts (each internally consistent, hash-linked, and verifiable), but their event chains differ in composition or evaluation semantics —

what defines that they represent the same decision?

In other words, is artifact equivalence determined by:

– identical chain structure?
– matching event types and ordering?
– equivalent policy evaluation outcomes?
– or something independent of the execution/audit trace itself?

Without a clear invariant for “decision identity”, interoperability at the artifact level seems to risk reducing to “both are valid” without a way to determine comparability.

Curious how you’re thinking about this boundary.

2 replies

jlugo63 Apr 7, 2026

@xsa520 this is the right question, and honestly it's one I don't think anyone has a clean answer to yet.

My current thinking: the hash chain itself is probably the best comparability anchor. Two systems don't need to agree on event schemas or internal structure to independently verify integrity. If I hand you a chain artifact from Gavel and you hand me one from Guardian, we can both walk the hash sequence and confirm "this chain is intact, nothing was reordered or tampered with" — without either system understanding the other's event types or payload formats.

What you'd need for that to work is a pretty minimal shared envelope:

chain_id (unique decision identifier)
Ordered list of event hashes (the integrity spine)
Actor identities at each step (who did what)
Final verdict (approved/denied/escalated)

Everything inside the events — policy evaluation details, evidence packet formats, risk scoring models — stays system-specific. Guardian can structure its internal events however makes sense for its architecture. Gavel does the same. But the outer envelope is legible across both.

The equivalence question is interesting because I think it has two layers. Structural equivalence (did both systems follow a propose-evaluate-approve sequence with separation of actors?) is checkable from the envelope. Semantic equivalence (did both systems apply the same rigor of review?) is harder and probably requires some shared vocabulary around what "review" or "attestation" means at a minimum.

I'd be interested in trying to co-design this between Guardian and Gavel specifically. Two real implementations trying to exchange artifacts will surface constraints that theorizing won't. We already have a to_artifact() / verify_artifact() round-trip on our side — if there's interest in defining a minimal interop schema, that seems like a concrete place to start. Happy to share our current artifact format as a starting point for that conversation.

tomjwxf Apr 26, 2026

@jlugo63 the governance chain's prev_hash / event_hash chaining is the same receipt chain pattern, and the "minimal shared envelope" you describe (chain_id, ordered hashes, actor identities, verdict) maps directly to the acta receipt schema.

To your cross-system exchange question: the answer we landed on is canonicalization + hash construction pinned at the spec level, not the implementation level. JCS (RFC 8785) for serialization, SHA-256 with explicit byte ordering for digests. That is what makes "five implementations produce bit-identical output" demonstrable rather than claimed.

Conformance fixtures already exist at ScopeBlind/agent-governance-testvectors. Adding a Gavel driver would be a one-PR change: read from fixtures/inputs/, write receipts, CI verifies against the canonical chain.

arian-gogani · 2026-04-12T15:40:48Z

arian-gogani
Apr 12, 2026

This is a great framing — policy enforcement vs decision evidence is exactly the right way to think about it. They're not alternatives, they're complementary layers.

I've been building both together in an open-source protocol called Nobulex. The approach:

Policy enforcement: Agents declare behavioral constraints (permit/forbid/require) that get evaluated before every action. Forbidden actions are blocked — the handler never runs.
Decision evidence: Every enforcement decision (allowed or blocked) goes into a SHA-256 hash-chained log. Each entry links cryptographically to the previous. Tamper with one entry, the whole chain breaks.

The enforcement layer gives you prevention. The evidence layer gives you auditability. Together they give you something neither can provide alone: third-party verifiable compliance. A regulator or counterparty can replay the log against the declared constraints and get a deterministic pass/fail without trusting the operator.

The cross-agent handshake takes this further — before two agents transact, they verify each other's decision evidence. No valid proof, no transaction.

Interactive demo (no install): nobulex.com/playground
Spec: Proof-of-Behavior Specification v0.1.0

Would be interested to hear how AGT thinks about the enforcement-before-execution vs log-and-audit-after tradeoff.

0 replies

arian-gogani · 2026-04-15T01:44:09Z

arian-gogani
Apr 15, 2026

This distinction between policy enforcement and decision evidence is where the interesting design space is.

AGT sits on the enforcement side — Cedar policies evaluated at runtime, sub-millisecond latency, deterministic allow/deny. That solves "did the agent follow the rules right now?"

But there is a second question that enforcement alone cannot answer: "can you prove to a third party — who was not present at runtime — that the agent followed the rules across its entire operational history?"

That requires decision evidence that is:

Tamper-evident — each decision references the hash of the previous one, so modifying any entry breaks the chain
Independently verifiable — a third party can replay the chain without access to the original system
Portable — the evidence travels with the agent across deployments and platforms

This is what Nobulex does. Every enforcement decision goes into a SHA-256 hash chain signed with Ed25519. The chain is the proof. Any verifier can check it offline.

The combination is stronger than either alone: AGT enforces policy at runtime → Nobulex captures the enforcement decisions into a verifiable chain → third parties (auditors, insurers, counterparty agents) can verify compliance without trusting the deployer's self-reported logs.

31 packages, 4,247 tests, MIT licensed. The cross-agent verification handshake is the piece that connects enforcement to evidence.

0 replies

Agent governance: policy enforcement vs decision evidence #276

Uh oh!

Replies: 9 comments · 8 replies

Uh oh!

imran-siddique Mar 16, 2026 Collaborator

Uh oh!

xsa520 Mar 17, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Implementation Overview

1. Gate Classification

2. Decision Context

3. Reasoning Reconstruction

Integrity Model

Validation

Questions for Maintainers

Uh oh!

Uh oh!

xsa520 Mar 17, 2026 Author

Uh oh!

Uh oh!

xsa520 Apr 1, 2026 Author

Uh oh!

Uh oh!

xsa520 Apr 1, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xsa520 Apr 5, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 9 comments 8 replies

imran-siddique
Mar 16, 2026
Collaborator

xsa520 Mar 17, 2026
Author

xsa520
Mar 17, 2026
Author

xsa520 Apr 1, 2026
Author

xsa520 Apr 1, 2026
Author

xsa520
Apr 5, 2026
Author