Agent Evidence is the concrete execution-evidence entry point for the Digital Biosphere Architecture.
Agent Evidence makes one narrow claim: execution evidence and operation accountability form a first-class verification boundary for governable AI systems.
This repository is the concrete execution-evidence entry in the Digital Biosphere Architecture. It packages operations into portable artifacts that another party can validate later, including offline. It is a method entry, not the architecture hub, not the audit control plane, and not a generic governance platform.
The current canonical package is
Execution Evidence and Operation Accountability Profile v0.1.
Package freeze:
- GitHub Release:
v0.2.0 - DOI: 10.5281/zenodo.19334062
- The frozen package version inside that release remains
v0.1
Core entry points:
- Spec:
spec/execution-evidence-operation-accountability-profile-v0.1.md - Schema:
schema/execution-evidence-operation-accountability-profile-v0.1.schema.json - Validator CLI:
agent-evidence validate-profile <file> - Examples:
examples/README.md - Demo:
demo/README.md - Reviewer-facing high-risk entry:
docs/high-risk-scenario-entry.md - Status and acceptance:
docs/STATUS.md,docs/ACCEPTANCE-CHECKLIST.md - Submission handoff:
submission/package-manifest.md,submission/final-handoff.md
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
agent-evidence validate-profile examples/minimal-valid-evidence.json
agent-evidence validate-profile examples/invalid-missing-required.json
agent-evidence validate-profile examples/invalid-unclosed-reference.json
agent-evidence validate-profile examples/invalid-policy-link-broken.json
agent-evidence validate-profile examples/valid-high-risk-payment-review-evidence.json
agent-evidence validate-profile examples/invalid-high-risk-unclosed-reference.json
agent-evidence validate-profile examples/invalid-high-risk-policy-link-broken.json
python3 demo/run_operation_accountability_demo.py
Expected result:
- the valid example returns JSON with
"ok": true - each invalid example returns JSON with
"ok": falseand one primary error code - the demo writes artifacts under
demo/artifacts/and ends with onePASSsummary line
Known environment note:
- the repository
.venvmay show onelangchain_corewarning under Python 3.14 during broader test runs; it does not affect the minimal profile, validator, or demo path
Use the following files as the canonical project and paper ledger:
- Project status and milestone ledger:
docs/STATUS.md - Flagship paper worklog:
paper/flagship/WORKLOG.md - Manuscript baselines:
submission/manuscript-baselines.md - Claims-to-evidence map:
paper/flagship/13_claims_to_evidence_map.md - Validation results table:
paper/flagship/18_validation_results_table.md
Do not mix manuscript surfaces.
B1-minimal-frozen:Execution Evidence and Operation Accountability Profile v0.1; claim = minimal verification boundaryB4-high-risk-current-main: reviewer-facing high-risk scenario entry; best fit for future high-risk / compliance-interface manuscriptsB2-extended-middle: parked unless fully rewrittenB3-aep-live-chain: historical AEP runtime-evidence surface
This repository already establishes:
- a minimal profile for execution evidence and operation accountability
- a profile-aware validator with explicit error codes
- a single-path demo
- reviewer-facing scenario slices
- a concrete manuscript-evidence mapping surface
This repo is:
- the concrete execution-evidence entry
- a minimal verification-boundary package
- a validator / specimen / demo surface
This repo is not:
- the architecture hub
- the audit control plane
- the walkthrough demo
- the execution-integrity kernel
- a generic agent governance platform
- a manifesto repository
- Architecture hub ->
digital-biosphere-architecture - Demo walkthrough ->
verifiable-agent-demo - Audit control plane ->
aro-audit - EDC Java spike entry ->
docs/edc-java-spike/README.md - Historical map ->
docs/lineage.md
- external-context evidence
- third-party checker
- manuscript assembly across introduction, discussion, and conclusion
Historical Execution Evidence Object, older Agent Evidence Profile wording,
legacy FDO mapping language, and conference-specimen notes
remain in this repository, but they are no longer the primary entry surface.
Use docs/lineage.md for the historical map and retained
paths.
The historical specimen track still keeps its original DOI: https://doi.org/10.5281/zenodo.19055948
EDC Java spike 已冻结为可引用的 augmentation-layer validation artifact。主仓入口见 docs/edc-java-spike/README.md,冻结 summary 见 SPIKE_FREEZE_SUMMARY.md。
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,langchain,sql]"
python integrations/langchain/export_evidence.py
agent-evidence verify-bundle --bundle-dir integrations/langchain/langchain-evidence-bundleThis runs the documented LangChain exporter and verifies the emitted bundle offline.
For a smaller callback/export recipe aimed at external readers, see
docs/cookbooks/langchain_minimal_evidence.md.
Tracing and logs help operators inspect a run. Agent Evidence packages runtime events into portable artifacts that another party can verify later, including offline.
Evidence path:
runtime events -> evidence bundle -> signed manifest -> detached anchor (when present) -> offline verify
This repository implements the bundle, manifest, signatures, and offline verification steps. External anchoring is out of scope for AEP v0.1 and is not enabled by default.
The toolkit now supports two storage modes:
- append-only local JSONL files
- SQLAlchemy-backed SQLite/PostgreSQL databases
The current model treats each record as a semantic event envelope:
event.event_typeis framework-neutral, such aschain.startortool.endevent.context.source_event_typepreserves the raw framework event namehashes.previous_event_hashlinks to the prior eventhashes.chain_hashprovides a cumulative chain tip for integrity checks
The evidence serialization layer implements:
- default redaction of sensitive fields
- maximum recursion depth
- circular reference protection
- object size limits
These protections prevent evidence bundles from leaking secrets or causing serialization-based denial-of-service conditions.
The project is organized so evidence capture stays modular:
agent_evidence: core models and recorder logicagent_evidence/crypto: canonical hashing and chain helpersagent_evidence/storage: append-only local storage backendsagent_evidence/integrations: adapters for external agent frameworksagent_evidence/cli: command-line entrypointsagent_evidence/schema: JSON schema for persisted envelopesexamples: executable usage examplestests: baseline regression coverage
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,langchain,sql]"
agent-evidence schemaThe current MVP path is an integrity-verifiable evidence bundle with offline verification. It is implemented as an Agent Evidence Profile that keeps one LangChain-first integration path and leaves room for later OpenInference / OpenTelemetry compatibility mappings.
AEP v0.1 is an integrity-verifiable evidence profile, not a non-repudiation system.
AEP is an integrity-verifiable evidence profile for autonomous agent runs, with offline verification and runtime provenance capture.
Generate the first bundle:
python integrations/langchain/export_evidence.py
agent-evidence verify-bundle --bundle-dir integrations/langchain/langchain-evidence-bundleRun the gate against one valid and one tampered fixture:
python scripts/run_profile_gate.pyThe next read-only path is a Conway-neutral Automaton sidecar/exporter. It
reads state.db, git history, and persisted on-chain references, then emits an
AEP bundle plus fdo-stub.json and erc8004-validation-stub.json.
agent-evidence export automaton \
--state-db /path/to/state.db \
--repo /path/to/state/repo \
--runtime-root /path/to/automaton-checkout \
--out ./automaton-aep-bundleagent-evidence export automaton has been validated against a live isolated-home
Automaton run and remains marked experimental while the live data contract is
still settling.
When --runtime-root is provided, the exporter attempts to resolve
source_runtime_version, source_runtime_commit, and source_runtime_dirty
from the Automaton checkout without changing the export path.
The controlled specimen release at v0.1-live-chain is a historical lineage surface, not the current primary entry.
The historical specimen archive for that track remains on Zenodo with DOI: https://doi.org/10.5281/zenodo.19055948
It freezes:
- AEP schema
- verify CLI
- LangChain exporter
- Automaton exporter
- live runbook
- public live/tampered fixtures
- AEP boundary statement
See docs/lineage.md for how this historical surface relates to the current Agent Evidence / AEP v0.1 package path.
The formal specimen release note is RELEASE_NOTE.md.
agent-evidence record \
--store ./data/evidence.jsonl \
--actor planner \
--event-type tool.call \
--input '{"task":"summarize"}' \
--output '{"status":"ok"}' \
--context '{"source":"cli","component":"tool"}'
agent-evidence list --store ./data/evidence.jsonl
agent-evidence show --store ./data/evidence.jsonl --index 0
agent-evidence verify --store ./data/evidence.jsonlSQL stores use a SQLAlchemy URL instead of a file path:
agent-evidence record \
--store sqlite+pysqlite:///./data/evidence.db \
--actor planner \
--event-type tool.call \
--context '{"source":"cli","component":"tool"}'
agent-evidence query \
--store sqlite+pysqlite:///./data/evidence.db \
--event-type tool.call \
--source cli
agent-evidence query \
--store sqlite+pysqlite:///./data/evidence.db \
--span-id tool-1 \
--parent-span-id root \
--offset 0 \
--limit 50
agent-evidence query \
--store sqlite+pysqlite:///./data/evidence.db \
--previous-event-hash <event-hash> \
--event-hash-from <lower-bound-hash> \
--event-hash-to <upper-bound-hash>
agent-evidence export \
--store ./data/evidence.jsonl \
--format json \
--output ./exports/evidence.bundle.json
agent-evidence export \
--store ./data/evidence.jsonl \
--format csv \
--output ./exports/evidence.csv \
--manifest-output ./exports/evidence.csv.manifest.json \
--private-key ./keys/manifest-private.pem \
--key-id evidence-demo
agent-evidence export \
--store ./data/evidence.jsonl \
--format xml \
--output ./exports/evidence.xml \
--manifest-output ./exports/evidence.xml.manifest.json \
--private-key ./keys/manifest-private.pem \
--key-id evidence-demo
agent-evidence export \
--store ./data/evidence.jsonl \
--format json \
--archive-format tar.gz \
--output ./exports/evidence-package.tgz \
--private-key ./keys/manifest-private.pem \
--key-id evidence-demo
agent-evidence export \
--store ./data/evidence.jsonl \
--format json \
--output ./exports/evidence.multisig.json \
--required-signatures 2 \
--required-signature-role approver=1 \
--required-signature-role attestor=1 \
--signer-config ./keys/operations-q2.signer.json \
--signer-config ./keys/compliance-q1.signer.json
agent-evidence verify-export \
--bundle ./exports/evidence.bundle.json \
--public-key ./keys/manifest-public.pem
agent-evidence verify-export \
--bundle ./exports/evidence.multisig.json \
--keyring ./keys/manifest-keyring.json
agent-evidence verify-export \
--bundle ./exports/evidence.multisig.json \
--keyring ./keys/manifest-keyring.json \
--required-signature-role approver=1
agent-evidence verify-export \
--xml ./exports/evidence.xml \
--manifest ./exports/evidence.xml.manifest.json \
--public-key ./keys/manifest-public.pem
agent-evidence verify-export \
--archive ./exports/evidence-package.tgz \
--public-key ./keys/manifest-public.pemmake install
make test
make lint
make hooksThe repository includes a .pre-commit-config.yaml with baseline whitespace,
JSON, and Ruff checks.
For PostgreSQL support, install the extra driver dependencies:
pip install -e ".[dev,postgres]"Each persisted record follows this shape:
{
"schema_version": "2.0.0",
"event": {
"event_id": "...",
"timestamp": "2026-03-16T00:00:00+00:00",
"event_type": "tool.end",
"actor": "search-tool",
"inputs": {},
"outputs": {},
"context": {
"source": "langchain",
"component": "tool",
"source_event_type": "on_tool_end",
"span_id": "...",
"parent_span_id": null,
"ancestor_span_ids": [],
"name": "search-tool",
"tags": ["langchain", "tool"],
"attributes": {}
},
"metadata": {}
},
"hashes": {
"event_hash": "...",
"previous_event_hash": "...",
"chain_hash": "..."
}
}event_type is the stable semantic layer. source_event_type keeps the
original callback or trace event for lossless debugging.
Agent Evidence supports two integration paths for current LangChain runtimes:
- callback handlers for live capture during execution
- stream event adapters for
Runnable.astream_events(..., version="v2")
Example callback usage:
from agent_evidence import EvidenceRecorder, LocalEvidenceStore
from agent_evidence.integrations import EvidenceCallbackHandler
from langchain_core.runnables import RunnableLambda
store = LocalEvidenceStore("data/evidence.jsonl")
recorder = EvidenceRecorder(store)
handler = EvidenceCallbackHandler(recorder)
chain = RunnableLambda(lambda text: text.upper()).with_config({"run_name": "uppercase"})
result = chain.invoke(
"hello",
config={"callbacks": [handler], "metadata": {"session_id": "demo"}},
)Example stream event capture:
import asyncio
from agent_evidence import EvidenceRecorder, LocalEvidenceStore
from agent_evidence.integrations import record_langchain_event
from langchain_core.runnables import RunnableLambda
async def main() -> None:
store = LocalEvidenceStore("data/evidence.jsonl")
recorder = EvidenceRecorder(store)
chain = RunnableLambda(lambda text: text[::-1]).with_config({"run_name": "reverse"})
async for event in chain.astream_events("hello", version="v2"):
record_langchain_event(recorder, event)
asyncio.run(main())Both integration paths normalize LangChain callback names such as
on_chain_start and on_tool_end into semantic event types such as
chain.start and tool.end.
OpenAI Agents SDK already exposes tracing extension points through custom trace processors, so Agent Evidence can mirror trace and span lifecycle events into the same semantic evidence model without patching the runtime.
Install the optional dependency:
pip install -e ".[openai-agents]"Example trace processor usage:
from agents import trace
from agents.tracing import custom_span
from agent_evidence import EvidenceRecorder, LocalEvidenceStore, export_json_bundle
from agent_evidence.integrations import install_openai_agents_processor
store = LocalEvidenceStore("data/openai-agents.evidence.jsonl")
recorder = EvidenceRecorder(store)
install_openai_agents_processor(recorder)
with trace(
"support-workflow",
group_id="session-001",
metadata={"session_id": "session-001"},
):
with custom_span("collect_context", {"channel": "chat"}):
pass
export_json_bundle(
store.query(source="openai_agents"),
"exports/openai-agents.bundle.json",
)By default install_openai_agents_processor() adds Agent Evidence alongside
the SDK's active processors. Pass replace=True if you want the SDK to emit
only into Agent Evidence for that process.
See examples/openai_agents/basic_export.py
for a complete local example.
Use the CLI to validate the chain after capture:
agent-evidence verify --store ./data/evidence.jsonlThis recomputes each event_hash, checks previous_event_hash, and validates
the cumulative chain_hash.
SqlEvidenceStore persists the semantic event envelope into a relational table
while keeping indexed columns for efficient filtering:
event_typeactortimestampsourcecomponentspan_idparent_span_idprevious_event_hashevent_hashchain_hash
The query interface supports:
- semantic filters such as
event_type,actor,source, andcomponent - chain traversal via
previous_event_hash - span-scoped inspection with
span_idandparent_span_id - time windows via
sinceanduntil - lexicographic hash windows via
event_hash_from/toandchain_hash_from/to - pagination via
offsetandlimit
Hash window filters operate on fixed-width lowercase SHA-256 hex digests, so lexicographic ranges map cleanly to digest ordering for indexed lookups.
The store accepts standard SQLAlchemy URLs, for example:
sqlite+pysqlite:///./data/evidence.dbpostgresql+psycopg://user:password@localhost:5432/agent_evidence
You can migrate existing JSONL evidence into SQLite or PostgreSQL:
agent-evidence migrate \
--source ./data/evidence.jsonl \
--target sqlite+pysqlite:///./data/evidence.dbThe query command works across both local and SQL stores, although SQL stores
are preferable once record volume grows beyond simple local inspection.
Agent Evidence supports three export shapes:
- JSON bundles containing
records,manifest, and one or more detached signatures - CSV artifacts plus a JSON sidecar manifest
- XML artifacts plus a JSON sidecar manifest
Exports can also be packaged as a single .zip or .tar.gz archive via
--archive-format. Packaged exports include:
- the exported artifact
- the sidecar manifest
- a small
package-manifest.jsonused to locate those files during verification
Both formats include a manifest with:
artifact_digestfor the exported bytes- ordered event-hash and chain-hash list digests
- first/last event hashes and latest chain hash
- export filters used to produce the artifact
Each signature can also carry:
key_idandkey_versionfor key rotationsignerandrolefor audit attributionsigned_atand arbitrary JSON metadata
Manifests can also carry threshold policies:
signature_policy.minimum_valid_signaturesforN-of-Msignature_policy.minimum_valid_signatures_by_rolefor role thresholds such as{"approver": 1, "attestor": 1}
If neither is present, verification defaults to requiring every signature in the artifact to validate. If only role thresholds are present, the effective total threshold defaults to the sum of those role requirements.
If a bundle carries signatures, verification is fail-closed: you must provide
--public-key or --keyring, otherwise verification returns ok=false.
Manifest signing uses Ed25519 PEM keys. To enable signing outside the dev environment:
pip install -e ".[signing]"Example key generation with OpenSSL:
openssl genpkey -algorithm Ed25519 -out ./keys/manifest-private.pem
openssl pkey -in ./keys/manifest-private.pem -pubout -out ./keys/manifest-public.pemSigner config files let you attach multiple signatures during export. Example
operations-q2.signer.json:
{
"private_key": "./operations-q2-private.pem",
"key_id": "operations",
"key_version": "2026-q2",
"signer": "Operations Bot",
"role": "approver",
"metadata": {
"environment": "prod"
}
}To embed signature policy in the exported manifest, pass:
--required-signatures Nfor a globalN-of-Mrule--required-signature-role <role>=<count>one or more times for role rules
verify-export will honor the manifest policy by default, or you can override
the global threshold and role thresholds at verification time with the same
flags.
Keyrings let verify-export resolve rotated keys by key_id and
key_version. Example manifest-keyring.json:
{
"keys": [
{
"key_id": "operations",
"key_version": "2026-q1",
"public_key": "./operations-q1-public.pem"
},
{
"key_id": "operations",
"key_version": "2026-q2",
"public_key": "./operations-q2-public.pem"
}
]
}When you export CSV, Agent Evidence writes the CSV artifact and a manifest
sidecar such as evidence.csv.manifest.json. Spreadsheet-facing CSV exports
sanitize cells that begin with formula prefixes such as =, +, -, or @
to reduce formula injection risk during human review. verify-export
validates the manifest summary, exported artifact digest, and every signature
from a provided public key or keyring.
Archive verification also enforces unpacking limits for member count, per-file
size, and total unpacked size so untrusted .zip and .tar.gz bundles fail
closed before full extraction.
For a repeatable real-database validation path, use the bundled Docker-backed integration script:
make install-postgres
make test-postgresThis starts a temporary PostgreSQL container, exports
AGENT_EVIDENCE_POSTGRES_URL, and runs tests/test_postgres_integration.py
against the live database.