Compliance scenarios for the jhcontext SDK demonstrating EU AI Act auditability through the PAC-AI protocol.
TL;DR: This is the lightweight proof-of-concept — no infrastructure, runs in-memory, and serializes envelopes/PROV graphs/audits to local files (~25ms). For the production-grade version with real CrewAI agents and AWS infrastructure (DynamoDB + S3), see jhcontext-crewai.
A hospital uses AI to recommend oncology treatment plans. The scenario proves that a physician performed genuine oversight — not just approval-clicking.
Pipeline: Sensor → Situation Recognition → Decision → Human Oversight → Audit
| Step | Agent | Duration | What it does |
|---|---|---|---|
| 1 | Sensor Agent | 2 min | Collects patient demographics, lab results, CT scan metadata |
| 2 | Situation Agent | 2 min | Interprets observations into clinical situation (post-op, high-risk) |
| 3 | Decision Agent | 1 min | Generates treatment recommendation (confidence: 0.87) |
| 4 | Dr. Chen | 10 min | Reviews CT scans (4m), treatment history (3m), pathology (2m), AI rec (1m), overrides |
| 5 | Audit Agent | — | Verifies temporal oversight + envelope integrity |
Audit checks:
- Temporal oversight: 4/4 human activities occurred after AI recommendation
- Review duration: 600s total (minimum required: 300s)
- Envelope integrity: SHA-256 hash + Ed25519 signature verified
python -m usecases.healthcare.runA university uses AI to grade essays. The scenario proves that student identity data (name, gender, disability status) was not used in grading.
Pipeline: Essay Ingestion → AI Grading → (Separate) Equity Reporting → Audit
| Step | Agent | What it does |
|---|---|---|
| 1 | Ingestion Agent | Strips identity from essay, creates content-only artifact |
| 2 | Grading Agent | Evaluates essay against rubric (argument 30%, evidence 30%, clarity 20%, thinking 20%) |
| 3 | Equity Agent | Isolated workflow — aggregates demographics for institutional reporting |
| 4 | Audit Agent | Verifies negative proof + workflow isolation + integrity |
Audit checks:
- Negative proof: 0 biometric/sensitive artifacts in grading dependency chain
- Workflow isolation: 0 shared entities between grading and equity PROV graphs
- Envelope integrity: SHA-256 hash + Ed25519 signature verified
python -m usecases.education.runBoth scenarios write to output/:
| File | Description |
|---|---|
*_envelope.json |
Signed JSON-LD envelope with artifacts, decision influence, privacy, compliance |
*_prov.ttl |
W3C PROV graph in Turtle format (entities, activities, agents, relations) |
*_audit.json |
Structured audit report with pass/fail per check + evidence |
*_metrics.json |
Performance timing per pipeline step |
The education scenario also produces education_equity_prov.ttl — the isolated equity reporting workflow.
- No persistence layer. These scenarios run in-memory — they build envelopes and PROV graphs using the jhcontext SDK directly, then serialize to files. The jhcontext server (FastAPI + SQLite) is not used here.
- Cryptographic signing. Envelopes are signed with Ed25519 (via the
cryptographypackage). The signature covers the canonical JSON-LD form excluding the proof block. This is real signing, not a mock — but keys are ephemeral (generated per run), not loaded from a keystore. - No encryption. Artifact content hashes are included but actual artifact content (CT scans, essay text) is simulated — only hashes and metadata are stored. In production, artifacts would be encrypted at rest in the storage backend.
- Simulated timing. Clinical timestamps are hardcoded to demonstrate temporal ordering. No
sleep()calls — the scenarios complete in ~25ms.
cd jhcontext-usecases
pip install -e ../jhcontext-sdk
python -m usecases.healthcare.run
python -m usecases.education.run| Metric | Healthcare | Education |
|---|---|---|
| Envelope size | 5.8 KB | 3.6 KB |
| PROV graph size | 4.9 KB | 1.7 KB + 1.3 KB |
| PROV entities | 6 | 3 + 3 |
| PROV activities | 8 | 2 + 1 |
| Total time | ~25 ms | ~22 ms |
A 7-benchmark suite measuring protocol performance across 4 layers: in-memory, SQLite persistence, REST API (FastAPI), and MCP tool dispatch.
pip install -e "../jhcontext-sdk[all]"
pip install matplotlib # optional, for figures
python -m usecases.benchmarks.run [--iterations 50]| Benchmark | What | Paper value |
|---|---|---|
| B1: In-Memory | Envelope + PROV build + audit (baseline) | Overhead comparison baseline |
| B2: SQLite | Direct SQLiteStorage write/read |
Persistence cost isolation |
| B3: REST API | Full FastAPI round-trip via TestClient | End-to-end stack overhead |
| B4: MCP | MCP tool dispatch (in-process call_tool) |
LLM agent integration overhead |
| B5: PROV Scaling | Query perf at 10→500 entities | Scaling behavior figure |
| B6: Crypto | SHA-256, Ed25519 sign/verify isolation | Crypto negligibility claim |
| B7: Compliance | ZIP export (envelope + PROV + audit + manifest) | Regulatory package validation |
Operation latency across interfaces:
| Operation | In-Memory | SQLite | REST API | MCP |
|---|---|---|---|---|
| Envelope build + sign | ~4 ms | — | — | — |
| Envelope persist | — | ~100 ms | ~280 ms | ~135 ms |
| Envelope retrieve | — | <1 ms | ~1.5 ms | ~1.4 ms |
| PROV persist | — | ~150 ms | ~155 ms | ~147 ms |
| PROV query (causal) | — | — | ~5 ms | ~5 ms |
| PROV query (temporal) | — | — | ~5 ms | ~5 ms |
| Audit (integrity) | ~0.35 ms | — | — | ~6 ms |
| Compliance package | — | — | ~10 ms | — |
PROV graph scaling:
| Entities | Build | Serialize | Causal Chain | Temporal Seq | Dep. Chain | Size |
|---|---|---|---|---|---|---|
| 10 | 2 ms | 3 ms | 0.05 ms | 0.1 ms | 0.4 ms | 6 KB |
| 50 | 28 ms | 21 ms | 0.3 ms | 0.8 ms | 15 ms | 33 KB |
| 100 | 42 ms | 55 ms | 0.8 ms | 1.4 ms | 65 ms | 67 KB |
| 500 | 211 ms | 258 ms | 10 ms | 7.5 ms | 1604 ms | 347 KB |
Crypto overhead: SHA-256 <0.01 ms (1KB), Ed25519 sign ~0.2 ms, verify ~0.2 ms — negligible vs pipeline.
output/benchmarks/
├── results.json # Full raw data with metadata
├── results.csv # Summary for LaTeX \input
├── summary.txt # Human-readable ASCII tables
└── figures/ # Requires matplotlib
├── overhead_comparison.png
├── prov_scaling.png
└── crypto_breakdown.png
- REST API benchmarks use
fastapi.testclient.TestClient(in-process ASGI, no network latency) for deterministic measurements. - MCP benchmarks call
call_tool()directly, bypassing stdio transport — isolates tool dispatch overhead. - SQLite write latency (~100ms) is dominated by
fsyncon commit. In production, WAL mode or async writes would reduce this significantly. - Figures require
matplotlib>=3.8. Benchmarks work without it — ASCII tables + JSON/CSV always generated.
Apache-2.0