bedrock-attest

Sigstore for AI agents. Sign your agent's behavior today. Verify it hasn't changed tomorrow.

Why

Model providers silently update their models. Your agent works today, but after a silent RLHF adjustment it might refuse more requests, use tools differently, or produce subtly worse outputs. You won't notice — until bedrock-attest catches it.

EU AI Act audit requirements (Q3 2026) demand "documented baselines" for High-Risk AI systems. A bedrock.fingerprint.json is exactly that artifact.

Quickstart

pip install bedrock-attest
bedrock init          # generate signing key + scaffold bedrock.toml + prompts.json
bedrock attest        # run test suite, save bedrock.fingerprint.json
bedrock verify        # re-run + compare → PASS / WARN / BREACH

Python API

from bedrock_attest import attest, verify
from bedrock_attest.config import BedrockConfig

# attest once when you set up your agent
config = BedrockConfig.from_toml("bedrock.toml")
fp = attest(config=config, test_inputs=["prompt 1", "prompt 2"], model="claude-opus-4-7")

import json
with open("bedrock.fingerprint.json", "w") as f:
    json.dump(fp.to_dict(), f, indent=2)

# verify whenever you suspect drift or before each release
from bedrock_attest.verify import verify
report = verify("bedrock.fingerprint.json", config, "claude-opus-4-7", test_inputs=[...])
if report.breached:
    raise RuntimeError(report.summary())

How it works

bedrock attest runs your test suite (20–50 prompts) against the model and collects these signals:

Signal	What it measures
`refusal_rate`	Fraction of outputs matching refusal patterns
`latency`	P50 / P95 / mean response time
`vocab_entropy`	Shannon entropy of output vocabulary
`tool_distribution`	Histogram of tool call frequencies
`tool_schema_hash`	SHA-256 of canonicalized tool schemas
`embedding_profile`	Mean cosine similarity to centroid (`[deep]`)
`anchor_drift`	Cosine distance from anchor text (`[deep]`)

The result is a signed bedrock.fingerprint.json. On bedrock verify, it re-runs the suite and compares signal-by-signal with configurable tolerance thresholds.

✓ tool_schema_hash        Δ 0.0000 (tol ±0.05)
✓ refusal_rate            Δ +0.0200 (tol ±0.10)
⚠ latency                 Δ +0.0800 (tol ±0.05)
✗ vocab_entropy           Δ +1.4200 (tol ±0.50)

Overall: BREACH

Exit codes: 0 = pass · 1 = warn · 2 = breach · 3 = error

Comparison

Category	Does	Does NOT
Eval frameworks (Promptfoo, DeepEval)	Quality tests per prompt	Signed behavioral artifact
Observability (Langfuse, Helicone)	Live tracing	Offline-verifiable contract
Drift detection (drift-detector, lithe)	Detects drift after it happens	Defines contract before drift
SLSA / Sigstore	Code provenance	Behavioral provenance
Snapshot tests	Exact string match	Semantic / tolerance-aware
bedrock-attest	✓ All of the above combined	—

When to use

✓ After model upgrades — verify behavior hasn't regressed
✓ CI gate — fail the build if behavioral contract is breached
✓ EU AI Act compliance — fingerprints are "documented baselines"
✓ Multi-provider comparison — same prompts, different models, signed diff

When NOT to use

✗ Real-time monitoring (use Langfuse or Helicone instead)
✗ Testing prompt quality / correctness (use DeepEval or Promptfoo)
✗ Detecting bugs in your own code (that's what unit tests are for)

Installation

# Core (no ML deps)
pip install bedrock-attest

# With semantic similarity signals
pip install "bedrock-attest[deep]"

# With drift-detector integration
pip install "bedrock-attest[drift]"

# Everything
pip install "bedrock-attest[all]"

Companion projects

bedrock-attest builds on top of — and signals back to — three sibling projects:

drift-detector-agent (PyPI: drift-detector-agent) — vocab entropy and behavioral signals. bedrock-attest uses DriftDetectorAgent.measure_drift() as one signal source.
lithe — context compression with anchor-drift. bedrock-attest borrows the anchor_drift signal concept from lithe.DriftMonitor.
mcp-shield — MCP output filter. bedrock-attest uses check_tool_definition() logic for tool schema hashing.

Limitations

Format is new — not yet an RFC or standard
Behavioral signals are statistical: identical setups may show small non-zero deltas
Ed25519 is the default signer; Cosign ([cosign] extra) adds more friction
Ollama is recommended for fast iteration (no API costs during development)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
bedrock_attest		bedrock_attest
examples		examples
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bedrock-attest

Why

Quickstart

Python API

How it works

Comparison

When to use

When NOT to use

Installation

Companion projects

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bedrock-attest

Why

Quickstart

Python API

How it works

Comparison

When to use

When NOT to use

Installation

Companion projects

Limitations

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages