Skip to content

Polycentric-Labs/evidentia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

475 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Evidentia

Bridge the gap between your controls and your frameworks.

Evidentia is an open-source, Python-first Governance, Risk, and Compliance (GRC) platform that turns compliance from a spreadsheet problem into a software problem. It provides composable building blocks for control gap analysis, AI-generated risk statements, automated evidence collection, and compliance reporting — all usable from a Python library, a CLI, or a REST API.

tests codecov PyPI version Python 3.12+ License: Apache 2.0 Code of Conduct OpenSSF Best Practices


Why Evidentia is different

GRC tooling has been waiting for its Terraform moment. Vanta and Drata are the AWS Consoles of compliance — polished SaaS dashboards charging $30K–$80K/year per framework. Evidentia is the library-first compliance infrastructure layer underneath: composable, embeddable, and built on the open standards (OSCAL) that the entire federal compliance stack is moving toward in 2026.

The moat trinity (v0.9.6 sharpened framing — May 2026): Evidentia's durable differentiation against the wave of "agentic GRC" launches in Q1 2026 collapses to three compounding moats — OSCAL emission + DFAH / PRT determinism + cryptographic CIMD provenance. The positioning posture explicitly does NOT chase "agentic" vocabulary; the leading frame is "deterministic, auditable, OSS-native compliance engineering." See docs/positioning-and-value.md §6.1.A + §6.1.B for the full counter-positioning rationale.

It's the only OSS tool today that combines all of the following in one package:

  • OSCAL-native end-to-end — ingests NIST OSCAL catalogs, emits OSCAL Assessment Results. Ready for the September 2026 federal mandate (OMB M-24-15 + FedRAMP RFC-0024). Vanta, Drata, AuditBoard, OneTrust, ServiceNow IRM, MetricStream all ship zero OSCAL output today.
  • Cryptographically signed evidence — Sigstore/Rekor keyless signing of every Assessment Results document, PEP 740 attestations on every released wheel + sdist, and a CycloneDX SBOM attached to every GitHub Release. No other OSS GRC tool puts cryptographic provenance on the evidence itself.
  • 89 framework catalogs bundled — NIST 800-53 Rev 5 (full 1,196 controls + Low/Moderate/High/Privacy baselines), CSF 2.0, FedRAMP, CMMC 2.0, EU AI Act, DORA, NIS2, GDPR, all 15 comprehensive US state privacy laws, the full FFIEC IT Examination Handbook stack (5 booklets: Information Security / Audit / Management / Operations / Outsourcing), FFIEC Cybersecurity Assessment Tool, OCC Bulletin 2026-13a / FRB SR 26-02 (Model Risk Management), plus 20 Tier-C licensed-stub frameworks with evidentia catalog import for your licensed copies. More than any commercial vendor (Vanta: 35+, Drata: 20+, RegScale: 60+).
  • Apache 2.0 license — embeddable in commercial products without AGPL friction. The OSS GRC alternatives (CISO Assistant, Eramba, Comp AI) are AGPL with paid commercial tiers.
  • Library-first, CLI-second, API-thirdpip install evidentia-core; from evidentia_core import GapAnalyzer. The closest peers (compliance-trestle, RegScale OSCAL Hub) are workflow / CLI tools, not embeddable libraries.
  • Air-gap capable--offline flag refuses network egress; signs evidence with GPG when Sigstore can't reach Fulcio. Built for FedRAMP High, CMMC Level 2, and EU sovereign-cloud deployments where SaaS GRC is a non-starter.
  • AI-optional, not AI-mandatory — risk-statement generation and control explanation use LLMs via LiteLLM (any provider — OpenAI, Anthropic, Google, Azure, Bedrock, Ollama, vLLM). Everything else is deterministic. No leakage of sensitive evidence to third-party AI APIs unless you explicitly opt in.
  • CI-native via composite GitHub Action — drop in uses: polycentric-labs/evidentia/.github/actions/gap-analysis@v0 and every PR runs gap analysis, posts a sticky compliance comment, and blocks merge on regression. No commercial GRC tool does this at the PR level.

For the full competitive analysis, market tailwinds, intellectual ancestry, and 12-month direction, see docs/positioning-and-value.md — a 12,000-word synthesis from 7 parallel research streams.


The problem

Modern GRC is stuck in 2005. The typical compliance program runs on:

  • Spreadsheets that get copy-pasted between auditors, engineers, and exec staff
  • Vendor GRC suites that cost $50K-500K per year, lock you in, and still require weeks of manual work to map one framework to another
  • Point solutions that handle one piece (a vulnerability scanner, a policy tracker, a questionnaire manager) but can't talk to each other
  • Consultants who re-learn your environment from scratch every audit cycle

Meanwhile, the compliance workload keeps growing. A single fintech or healthcare SaaS company today might simultaneously be in scope for SOC 2, PCI DSS 4.0, HIPAA, GDPR, CCPA, ISO 27001, and NYDFS Part 500 — seven frameworks with substantial overlap, each demanding its own evidence, gap analysis, and risk documentation.

The same control (say, "MFA on all privileged accounts") satisfies requirements in all seven. But because each framework uses different vocabulary, numbering, and organization, compliance teams end up documenting the same control seven different ways — and audit season becomes a months-long exercise in cross-referencing.

This is a software problem. It should be solved the way software problems get solved: with composable libraries, structured data, version control, and automation.

Why Evidentia exists

Evidentia is built on four principles:

  1. Open standards, not vendor lock-in. Inputs and outputs use OSCAL — NIST's open standard for control catalogs and assessment results. If you outgrow Evidentia, your data travels with you.

  2. Library-first, CLI-second, API-third. The Python library is the canonical interface. The CLI is a thin wrapper. The REST API is a thin wrapper. Everything Evidentia can do via the CLI, it can do from a Python script — which means you can embed it in CI pipelines, compliance portals, or custom integrations.

  3. AI where it helps, not where it hurts. Evidentia uses LLMs for tasks where language understanding is the bottleneck (writing NIST SP 800-30 risk statements from a gap, validating whether a policy PDF actually covers a control). It uses deterministic code for tasks where correctness matters (OSCAL parsing, gap arithmetic, cross-framework mapping).

  4. Provider-agnostic LLM access. All AI features route through LiteLLM + Instructor, giving you structured Pydantic output from any model — OpenAI, Anthropic, Google, Azure, Ollama, vLLM, or any OpenAI-compatible endpoint. No vendor lock-in on the AI layer either.

Who it's for

  • Security engineers at startups and mid-size companies who need to hit SOC 2 Type II without hiring a full compliance team
  • GRC consultants who want to stop rebuilding the same spreadsheets for every engagement
  • Platform teams who want to embed gap analysis into their CI pipelines and catch drift before the auditor does
  • CISO offices that want a real audit trail on risk decisions, backed by versioned structured data instead of Slack threads
  • Anyone who has ever said "I know this NIST control is the same as this SOC 2 criterion, but I don't want to re-document it for the fifth time"

Current status: 89 frameworks bundled, full suite passing

Recent releases

v0.10.5 (May 2026)Output-format expansion (OCSF Detection Finding + CycloneDX VEX) + workspace refactor (evidentia-eval package extraction) + collector idempotency hardening for the v1.0 API freeze + positioning rewrite + design-partner program draft. Patch on v0.10.4 (released ~24 hours prior). Two new evidentia gap analyze output formats: --format ocsf-detection emits OCSF Detection Finding (class_uid 2004) JSON arrays — the SIEM-target counterpart to v0.10.4's Compliance Finding 2003; Splunk / Elastic / Microsoft Sentinel / Datadog ingest 2004 natively as production traffic. --format cyclonedx-vex emits CycloneDX 1.6 VEX documents, complementing the existing release-time CycloneDX SBOM emit. OutputFormat Literal extended additively; existing emits (json / csv / markdown / oscal-ar / sarif / ocsf) unchanged. New 8th workspace package evidentia-eval: the DFAH determinism + faithfulness harness moves out of evidentia-ai/eval/ to its own pip-installable package, making the air-gap clean install posture a hard contract (the production runtime no longer transitively loads sentence-transformers / numpy); deprecation shim through v0.11.x, removal in v0.12.0. Collector idempotency hardening (Phase 10): deterministic SecurityFinding.id derivation from (source_system, source_finding_id) natural keys via UUID v5 + pinned namespace — closes the principal-engineer- audit-flagged gap that two collect() runs against an unchanged source produced random uuid4 ids; additive-only, OCSF round-trip preserved. Positioning rewrite (Phase 11): §10 rewritten with no metaphors (direct enumerated claims only until reference customers exist); EU AI Act Annex III date corrected from 2026-08-02 → 2027-12-02 per Digital Omnibus 2026-05-07 political agreement; v1.0 acceptance gates rewritten (OpenSSF Gold honest-gap tied to SOC 2 Type I segregation-of-duties). Design-partner program draft (Phase 12): 6-8 week fixed-fee "Evidentia CMMC + FedRAMP Readiness Accelerator" engagement at $25K-$75K tiered pricing; 9 open questions queued for review. Phases 1-5 of the 2026-05-24 v0.10.5 draft (OSPS Baseline catalog + OSCAL conversion

  • OSPS-CONFORMANCE.md + SECURITY.md refresh + EOL.md + verification-recipe) deferred to v0.10.6 without scope loss. 3,443 tests passing / 14 skipped / 3,457 collected across 278 source files; mypy strict 0/0; ruff clean. PyPI: 8 packages at 0.10.5 (was 7; evidentia-eval added).

v0.10.2 (May 2026)MCP-as-backend tool surface expansion + GRC Engineering Club marketplace plugin (staged) + close v0.10.1 F-V101-L1 SSRF surface. Third release of the v0.10.x line on the same calendar day as v0.10.0 + v0.10.1 ships. 4 new MCP tools (gap_analyze_sarif, collect_ocsf, tprm_vendor_list, poam_list) expand the §MCP tool contract from 8 → 12, bringing the v0.10.x OCSF / SARIF / TPRM / POA&M surface into Claude Desktop, Claude Code, and other MCP-aware AI clients. GRC Engineering Club marketplace plugin staged in marketplace/grc-engineering-suite/plugins/evidentia/ — first concrete OSS-vs-paid scope decision locked at generalist OSS only (persona-tied skills reserved for the future Evidentia Pro / Federal commercial tier). F-V101-L1 SSRF surface CLOSED via a new default-on --block-private-ips flag on evidentia collect ocsf URL mode (rejects RFC1918 + link-local incl. AWS metadata 169.254.169.254 + loopback + multicast + reserved ranges via socket.getaddrinfo pre-resolution). All v0.10.x findings now closed (zero unfixed CRITICAL / HIGH / MEDIUM / LOW). 3348 tests passing / 14 skipped across 267 source files; mypy strict 0/0; ruff clean. PyPI: 7 packages at 0.10.2.

v0.10.1 (May 2026)OCSF integration consolidation + close both v0.10.0 review findings. Same-day patch on v0.10.0 (v0.10.0 published 03:07 UTC, v0.10.1 ships ~the same day). Closes F-V100-L1 (trust-boundary on the OCSF unmapped["evidentia"] block — finding_from_ocsf gains a trust_unmapped: bool = True keyword-only parameter; the new ingestion collector passes False) and F-V100-M1 (release-tooling — scripts/ bump_version.py reads [tool.uv.sources] as the workspace allowlist so third-party pins like py-ocsf-models>=0.9.0,<0.10.0 are no longer over-bumped). Ships the deferred third-party OCSF ingestion collector (evidentia collect ocsf --input <file-or-url> — HTTPS-only URL mode with size + timeout caps; Detection Finding path handles Prowler + AWS Security Hub output) and evidentia collect convert --format ocsf (SecurityFinding → OCSF Compliance Finding bundle). Migrates the remaining 11 collectors (Okta + 4 SQL adapters + Databricks + Snowflake + Vanta + Drata + BitSight + SecurityScorecard) to populate compliance_status per the v0.10.0 pilot pattern. Adds the Finding class-name alias on SecurityFinding (deprecation target: v1.0.0) and the EventAction.COLLECT_OCSF_EMITTED audit value. 3332 tests passing / 14 skipped across 267 source files; mypy strict 0/0; ruff clean. PyPI: 7 packages at 0.10.1.

v0.10.0 (May 2026)OCSF-aligned findings schema + SARIF CI-gate output. Opens the v0.10.x research-driven integration line. Ships an additive normalized findings schema (SecurityFinding gains compliance_status + remediation Optional fields mirroring OCSF compliance.status + remediation.desc; new ComplianceStatus enum), a bidirectional OCSF Compliance Finding mapping layer (evidentia_core.ocsf — behind the new optional [ocsf] extra pulling py-ocsf-models; finding_to_ocsf / finding_from_ocsf with lossless round-trip for Evidentia-produced findings via the OCSF-standard unmapped block), SARIF 2.1.0 output for evidentia gap analyze --format sarif (each ControlGap → a SARIF result; stable partialFingerprints track gaps across runs; surfaces in GitHub code scanning + GitLab security dashboards), and 3 pilot collectors (AWS, GitHub, Postgres) populating the new fields. finding.py joins the docs/api-stability.md frozen-models table. The remaining ~11 collectors + the third-party OCSF ingestion collector (incl. a Detection Finding path for Prowler / AWS Security Hub) are tracked for v0.10.1. 3292 tests passing / 17 skipped across 265 source files; mypy strict 0/0; ruff clean. PyPI: 7 packages at 0.10.0.

v0.9.9 (May 2026)Supply-chain hygiene + pre-push gate fidelity. A focused supply-chain patch — no source or test code changed. Closes paramiko CVE-2026-44405 (compliance-trestle 4.0.3 pulls paramiko to 5.0.0, a dev-only transitive dependency); adds an osv-scanner --sbom pre-push gate wired into CI and the release checklist through one shared script, so transitive and disputed advisories surface before a tag; and clears the entire Dependabot PR queue (five grouped version-update PRs merged, three orphaned PRs closed). 3250 tests passing / 14 skipped across 261 source files; mypy strict 0/0; ruff clean. PyPI: 7 packages at 0.9.9.

v0.9.8 (May 2026)v0.9.7 deferral closure + v1.0-prep integration wiring. Wires v0.9.7's data/decision-only primitives into live surfaces: multi-tenant RBAC enforced end-to-end across CLI, REST, and the POA&M / evidence stores; MCP tool outputs signed at the FastMCP dispatch layer with an in-tree Sigstore-keyless reference signer (evidentia_mcp.sigstore_signer); a shared evidentia_core.factory_resolver; and the FedRAMP Rev 5 High + CMMC L2 HF-Hub eval-suite subsets. Closes the CR-V97 review polish, the F-V97-mcp-signer-trust + F-V97-multi-tenant-claim-spoofing findings, three SigningContext.production() runtime breaks (sigstore 4.2.0 API migration), and the idna CVE-2026-45409 bump. 3250 tests passing / 14 skipped across 262 source files; mypy strict 0/0; ruff clean. PyPI: 7 packages at 0.9.8.

v0.9.5 (May 2026)Walk-through refinement + collaboration primitives + carry-over closure. Closes 18 deferred review findings (7 v0.9.3 LOWs + 8 v0.9.4 LOWs + 2 INFOs + 1 rebucketed Q-finding), adds 3 collaboration-primitive surfaces, validates the federal-SI walk-through against an AI-persona reviewer with FedRAMP 20x / RFC-0024 framing, and ships daemon-status REST expansion. Phase 1 carry-over closure: new evidentia_core.security.atomic_write_text helper (lifts the v0.9.4 inline .tmp cleanup across 4 call sites); auto- wired ProxyHeadersMiddleware via EVIDENTIA_TRUST_PROXY_HEADERS=1 env var or create_app(trust_proxy_headers=True); pytest-randomly + schemathesis + playwright in [dev] deps with tests/dast/ scaffold; 7 v0.9.3 LOWs (explicit SSL context on webhook urlopen, trust-boundary doc on EVIDENTIA_AI_REGISTRY_ DIR, SIGINT race window doc, state-file size cap, RFC 5321 recipient validation, dedup-state mtime cache, sleep_fn typing); 8 v0.9.4 LOWs (FileLock fd-leak fix CWE-404, fcntl per-fd doc CWE-662, rate-limit LRU spray protection CWE-400, idempotency replay-after-delete regression test, IPv6 scope-id correct sort, sleep_fn Callable typing, cross-process FileLock test, rate-limit docstring tightening); 2 INFO closures (Pydantic-upgrade body-hash audit guidance + model_ copy validator re-validate pattern). Phase 2 operator polish: AI-persona walk-through validation (10 refinement recommendations: CLI flag bug fix, OMB M-24-10 / NIST AI RMF reframe, FedRAMP 20x framing, Step 8 OSCAL POA&M emit, CISA Secure-by-Design framing, CA-7 meta-control clarification, SCR Form adjacency, FIPS 199 + ATO-linkage as v0.9.6 targets); GET /api/conmon/daemon-history?limit=N rolling-history endpoint + --history-file daemon CLI flag; Prometheus evidentia_conmon_daemon_* gauges at /api/metrics. Phase 3 collaboration primitives: POA&M Milestone.owner + Milestone.reviewer Optional fields with CLI + REST filters; EvidenceArtifact.version + lineage_id + predecessor_id + new_version() factory helper (data-model + helper at v0.9.5; WORM store-side append-only lands v0.9.6); evidentia_core. rbac package with Role enum / RBACPolicy / check_ permission / FastAPI require_role(action) dependency factory + EVIDENTIA_RBAC_POLICY_FILE env-var policy loading (default permissive policy preserves v0.9.4 behavior; CLI-side enforcement deferred to v0.9.6). 2862 tests passing / 17 skipped across ~225 source files; mypy strict 0/0; ruff clean. PyPI: 7 packages at 0.9.5. First direct-push ship-cycle since the v0.9.x line started using PR-based workflow (per the post-v0.9.4 lesson; enforce_admins: False on branch protection always allowed this; PR ceremony was self-imposed not branch-protection- required).

v0.9.4 (May 2026)Daemon hardening + operator polish + federal-SI walk-through. Consolidation pass closing v0.9.3 deferred review items. Phase 1 daemon hardening: cross-platform file-lock helper (POSIX fcntl.flock + Windows msvcrt.locking) wrapping mark_completed + AlertDeduper.mark_dispatched behind opt-in --state-lock flag — closes F-V93-Q3 HIGH race-condition; webhook SSRF mitigation (default-deny http:// + loopback/RFC1918/ link-local/reserved IPs; opt-in --webhook-allow-plaintext + --webhook-allow-private-network) — closes F-V93-S2 MEDIUM (CWE-918, blocks cloud-metadata-service IAM exfiltration); per-client-IP token-bucket rate-limit middleware on POST /api/ai-gov/register + /classify + X-Idempotency-Key header support on register — closes F-V93-S10 LOW; polish batch (F-V93-Q11 dynamic User-Agent + Q12 Windows latency doc + Q14 narrow except + S9 path-disclosure doc). Phase 2 operator polish: GET /api/conmon/daemon-status endpoint + status sidecar; evidentia conmon dedup-list CLI verb; evidentia ai-gov update + retire CLI verbs wiring the AI_SYSTEM_UPDATED + AI_SYSTEM_RETIRED EventActions. Phase 3 federal-SI walk-through (reserved since v0.9.0): synthetic fixtures + 7-step recipe + smoke test; 3 walk-through-surfaced refinements applied. Phase 4 hygiene: workflow_dispatch on test.yml, token-rotation doc fix, flaky-Jira-test real fix (root cause was 0.7%-probability assertion-collision with random request_id, NOT fixture leak). 19th consecutive PROCEED-CLEAN of v0.7.x → v0.8.x → v0.9.x line. 2798 tests passing / 17 skipped across 219 source files; mypy strict 0/0; ruff clean. PyPI: 7 packages at 0.9.4.

v0.9.3 (May 2026)CONMON daemon + AI governance + carry-overs. The largest minor release of the v0.9.x line so far. Combines two originally-PROPOSED themes into a single ship: Theme A (CONMON daemon)evidentia conmon watch --poll long-running daemon with state-file-driven slug→last_completed tracking, configurable poll interval (default 3600s; min 60s), graceful SIGINT/SIGTERM shutdown, SMTP + generic-webhook alerting channels (STARTTLS-only with has_extn assertion + capture-replay-protected HMAC signing with X-Evidentia-Timestamp header), control health scoring CLI + REST endpoint, ContinuousEvidenceSource plugin Protocol. Theme B (AI governance) — EU AI Act catalog enrichment (risk_tier + applies_to_annex_iii on every Article 9-15 control; tier promoted D→A), NIST AI RMF crosswalks to EU AI Act + ISO 42001, evidentia_core.ai_governance classification + registry + file-backed store, evidentia ai-gov CLI (5 verbs) + /api/ai-gov/* REST router (5 endpoints with audit-event parity). Plus carry-overs: LLM-rater κ recompute on full 147-entry corpus (framework-agnostic κ = 0.8820; NIST κ = 1.0000; overall κ = 0.7956; 3 of 5 subsets PASS κ≥0.80); docker/requirements drift CI gate; GHCR public-flip release-checklist; api-stability.md DRAFT. Step 5.A pre-release-review batch closes 8 of 10 MEDIUM findings inline (F-V93-S1 SMTP STARTTLS hardening + F-V93-S3 webhook HMAC replay protection + F-V93-Q1 dead unknown field removal + F-V93-Q2 AI gov REST router audit-event wiring + F-V93-Q5 new CONMON_DAEMON_POLL_FAILED action + F-V93-Q7 enum compare + F-V93-Q8 upfront --deployment-status validation + F-V93-Q10 dedup-state corruption backup). The 1 HIGH (F-V93-Q3 race-condition) closed via documented single-writer contract matching v0.9.0 poam_store / v0.7.9 vendor_store precedent. 18th consecutive PROCEED-CLEAN of v0.7.x → v0.8.x → v0.9.x line. 2742 tests passing / 17 skipped across 217 source files; mypy strict 0/0; ruff clean. PyPI: 7 packages all at 0.9.3 with PEP 740 attestations.

v0.9.2 (May 2026)CONMON REST parity + LLM rater + federal corpus. Walk-through-driven refinement delivering HTTP API surface for continuous-monitoring cadences (4 endpoints under /api/conmon/* matching CLI parity), LLM-assisted second rater (scripts/llm_rater.py with temperature-0 deterministic labeling + --rule llm mode in the κ script), federal calibration corpus (corpus_federal.jsonl — 24 entries spanning FedRAMP ConMon + POA&M + NIST 800-53 CA-7; total corpus 147 entries), and 10 federal-SI walk-through scenarios in capability-matrix.md. 17th consecutive PROCEED-CLEAN.

v0.9.1 (May 2026)Polycentric Labs org migration. Repo transferred from allenfbyrd/evidentia to polycentric-labs/ evidentia to unblock the org's GitHub Team plan + future federal- SaaS shipping context. All 7 PyPI Trusted Publisher entries re-registered. 16th consecutive PROCEED-CLEAN.

v0.9.0 (May 2026)Federal compliance — POA&M lifecycle + CONMON cycle calendar + walk-through-as-validation. First minor of the v0.9.x line. Opens the federal-compliance theme reserved at v0.8.7 cycle-close. Lands operator-facing surfaces auditors expect in any regulated-industry GRC tool: Plan-of-Action-and- Milestones tracking + Continuous Monitoring cycle calendar. Phase 1 — POA&M data layer: POAMState 5-state enum (planned / in_progress / overdue / completed / verified) aligned to FedRAMP POA&M Template Completion Guide v3.0 + NIST SP 800-53A Rev 5 Appendix F; forward-only state transitions; Milestone Pydantic record; ControlGap.poam_milestones optional list (default-empty for v0.7.x + v0.8.x backward-compat); new evidentia_core.poam sub-package (state.py + milestone.py); new evidentia_core.poam_store JSON file-store mirroring v0.7.9 vendor_store; 6 new EventActions. Phase 2evidentia poam CLI (7 verbs: create from gap report / list / show / update / milestone add|update / delete / calendar); /api/poam/* FastAPI router (8 endpoints); evidentia_core.oscal.poam_exporter.gap_report_to_oscal_poam() emitting OSCAL 1.1.2 plan-of-action-and-milestones JSON with SHA-256 back-matter integrity (mirrors v0.7.0 finding-resource embedding). Default severity-filter is CRITICAL + HIGH per FedRAMP §3.1 auditor-default; --all opts into the full set. Phase 3evidentia_core.conmon pure-function library with 7 bundled cadences (NIST 800-53 CA-7 monthly + FedRAMP ConMon × 3 + CMMC L2 triennial + DoD RMF annual + OCC 2026-13a model-risk annual); evidentia conmon CLI (list / next / check); 2 new EventActions. No daemon — operators poll. Plus new operator runbooks (docs/poam-runbook.md + docs/conmon-runbook.md); 14-item Step 5.A refinement batch (UUID canonicalization in poam_store + vendor_store via str(UUID(id)) preventing duplicate-records-per-alias + non-conformant OSCAL UUID emit; _enum_value extracted to evidentia_core.models.common; stale-doc refreshes across governance + config + generation_context references). 15th consecutive PROCEED-CLEAN of v0.7.x → v0.8.x → v0.9.x line. 2583 tests passing / 17 skipped across 227 source files; mypy strict 0/0; ruff clean. Phase 4 walk-through deferred to v0.9.1 per §31.A POA&M-first / walk-through-as-validation posture.

v0.8.7 (May 2026)Final v0.8.x wrap-up. Single focused session closing the v0.8.6 P3 CLI deferral + backfilling v0.8.6 cycle-close artifacts deferred during single-session compression. NEW --faithfulness-threshold-mode {framework-aware,fixed} flag on evidentia eval risk- determinism (default framework-aware) closes the v0.8.6 P3 deferral; --faithfulness-threshold default changed from 0.3None sentinel for backward-compatible framework- aware default resolution. Resolution precedence: explicit value wins → framework-aware mode (extracts framework from prompt_id; looks up via resolve_threshold(framework, method)) → fixed mode (0.30 framework-agnostic). 6 v0.8.6 cycle-close artifacts backfilled (docs/security-review-v0.8.6.md + docs/v0.8.6-plan.md + threat-model v0.8.6 delta + capability-matrix v0.8.6 snapshot + README v0.8.6 entry + ROADMAP v0.8.6 PLANNED → SHIPPED transition). 14th consecutive PROCEED-CLEAN of v0.7.x → v0.8.x line. 2386 tests passing across 217 source files; mypy strict 0/0; ruff clean. FINAL v0.8.x patch — v0.9.0 opens with the federal-compliance theme per the 2026-04-28 §10 Q4 lock-in.

v0.8.6 (May 2026)CIMD scope enforcement at MCP- protocol level + Cohen's Kappa rater agreement + per-claim confidence + framework-aware threshold defaults + v0.7.x retrospective + v1.0 transition narrative DRAFT. Aggressive ~2-3 week comprehensive scope (single-session compression matching v0.8.3 + v0.8.4 + v0.8.5 cadence). Closes ALL 3 v0.8.5 carry-overs + 3 cycle-additions. MCP CIMD scope enforcement at MCP-protocol level: NEW evidentia_mcp.scope module monkey-binds FastMCP.call_tool (mcp Python SDK 1.27 has no public middleware hook); --default-client-id <slug> CLI flag; pass-through preserves v0.8.5 default no-gating behavior; per-call AI_MCP_TOOL_AUTHORIZED / AI_MCP_TOOL_DENIED audit events; deny paths raise McpError code -32602. Cohen's Kappa rater agreement script (scripts/compute_inter_rater_kappa.py): two-rater file mode + rule-based-rater mode; CI-gateable exit codes; empirical κ = 0.4848 (moderate) at jaccard threshold 0.85 ships as "single-rater + κ probe inconclusive" per the documented R3 mitigation. Per-claim bootstrap-resampled confidence: FaithfulnessResult.confidence field (default-off cost-aware ~100ms/claim; opt-in via compute_confidence=True); framework-aware threshold defaults: DEFAULT_THRESHOLDS_BY_FRAMEWORK_JACCARD map (NIST 0.60 / FFIEC 0.35 / ISO27001 0.30) + resolve_threshold(framework, method) helper + FaithfulnessResult.framework field. docs/v0.7.x- retrospective.md publishes the 18-release v0.7.x cycle narrative; docs/v1.0-transition.md DRAFT captures v1.0 theme candidates + acceptance gates + open questions for v0.9.0 cycle-open. Thirteenth consecutive PROCEED-CLEAN of the v0.7.x → v0.8.x line. 2383 tests passing across 217 source files; mypy strict 0/0; ruff clean. v0.8.7 wrap-up release closes the v0.8.6 P3 CLI deferral.

v0.8.5 (May 2026)DFAH CLI flags + corpus expansion + real-LLM integration tests + MCP CIMD richness. Aggressive ~2-3 week comprehensive scope (single-session compression matching v0.8.3 + v0.8.4 cadence). Closes ALL 4 v0.8.4 carry-overs in one focused session per Allen's explicit "implement CIMD now" directive — ending the 5-cycle CIMD deferral pattern. DFAH faithfulness CLI flags: evidentia eval risk-determinism --check-faithfulness --faithfulness-threshold N --faithfulness-method {jaccard,semantic} --source-clauses-file <yaml>. Closes the v0.8.4 P1.2 CLI-surface deferral; pre-condition validation rejects malformed inputs BEFORE any LLM call fires. DFAH corpus expansion to 123 entries with per-framework subsets (NIST 24 / FFIEC 24 / ISO 27001 24). tune_faithfulness_threshold.py --corpus-pattern <glob> for per-framework sweep; empirical per-framework recommended thresholds documented. Real-LLM integration tests at tests/integration/test_eval/ opt-in via EVIDENTIA_LLM_INTEGRATION=1. MCP CIMD richness: new evidentia_mcp.cimd module ships CIMDDocument (per RFC 7591 + MCP conventions) + CIMDRegistry (JSON-file- backed, version-tagged). evidentia mcp serve --cimd-registry <path> flag wires it through stdio + SSE + HTTP transports. Twelfth consecutive PROCEED-CLEAN of v0.7.x → v0.8.x line. 2338 tests passing across 216 source files; mypy strict 0/0; ruff clean. v0.8.6 reservations: per-tool scope enforcement at MCP-protocol level + multi-rater corpus pass

  • per-claim confidence scoring.

v0.8.4 (May 2026)G4 Path 2 + DFAHarness wiring. Aggressive ~2-3 week focused scope (single-session compression matching v0.8.3 cadence). Closes the v0.8.3 ship-failure root cause via G4 Path 2 (post-PyPI regeneration in release.yml — sidesteps cross-platform reproducibility entirely). New regeneration step BETWEEN Wait-for-PyPI + docker build runs pip-compile --generate-hashes --no-emit-find-links against PyPI's just-published wheels → ephemeral docker/requirements.txt overwrite → docker build picks it up. Hashes match because pip-compile downloads from PyPI's bytes in the Linux CI runner — same source as the container build's pip install. Built-in 3-attempt retry loop with 30s sleeps absorbs PyPI propagation lag. DFAHarness check_faithfulness=True wiring closes the v0.8.3 P1.2 deferral: EvalSample.source_clauses field + EvalResult.faithfulness_results list + DFAHarness.run(check_faithfulness=, faithfulness_threshold=, faithfulness_method=, claim_extraction_fn=, faithfulness_score_fn=) kwargs. EventAction.AI_EVAL_FAITHFULNESS_CHECKED + AI_EVAL_FAITHFULNESS_VIOLATION (reserved-but-inactive in v0.8.0) ACTIVATED. Mock-callable injection points keep harness tests cost-zero while exercising real production code paths. 14 new unit tests across 5 test classes. Eleventh consecutive PROCEED-CLEAN of the v0.7.x → v0.8.x line. 2313 tests passing across 220+ source files; mypy strict 0/0; ruff clean. MCP CIMD richness deferred to v0.8.5 (5th cycle-deferral; v0.8.5 re-evaluates with potential formal retirement) + CLI flags + calibration corpus expansion + real-LLM integration tests deferred to v0.8.5.

v0.8.3 + v0.8.3.1 hot-fix (May 2026)Supply-chain G4 attempt + AI-quality completion. Aggressive ~3-week cycle executed in a single focused session. Closes 5 of 8 v0.8.2 carry-overs (G4 attempt failed first-fire + reverted in same-day hot-fix). G4 Path 1 ATTEMPTED: Dockerfile flipped from exact-version pinning to pip install --require-hashes -r /tmp/requirements.txt against hash-pinned docker/requirements.txt; release.yml exported SOURCE_DATE_EPOCH for SOURCE_DATE_EPOCH-driven uv build reproducibility. First-fire revealed uv build is NOT byte-identical between Windows local + Linux CI runner even with same SOURCE_DATE_EPOCH (file-ordering / timestamp-precision drift). PyPI publish succeeded but container build's pip install --require-hashes failed: local-computed hashes ≠ Linux-CI-built wheel hashes. v0.8.3.1 hot-fix REVERTED the Dockerfile to exact-version pinning (same v0.8.2 surface; no regression); container ship recovered same-day. Recurring Scorecard PinnedDependencies false-positive cycle continued (alerts dismissed per the runbook). G4 closure deferred to v0.8.4 with Path 2 (post-PyPI regeneration; sidesteps cross- platform reproducibility entirely). F-V82-S1: bump_version.py --regenerate-requirements auto-detects host platform; on non-Linux hosts auto-invokes pip-compile inside Linux base image. F-V82-S2: evidentia eval verify CLI replaces broad except Exception with specific SigstoreError subclass catches mapped to distinct exit codes. DFAH sentence-transformers path (P1.1): opt-in [eval-faithfulness] extra; default model all-MiniLM-L6-v2 (~90 MB); catches paraphrases that the v0.8.2 Jaccard baseline misses. LLM atomic-claim extraction (P1.2): new extract_claims() function decomposes AI-generated artifacts into atomic verifiable claims for faithfulness scoring. Calibration corpus (P1.3): 50-entry corpus + threshold- tuning script empirically guide operators on per-corpus threshold selection. Tenth consecutive PROCEED-CLEAN of the v0.7.x → v0.8.x line. 2299 tests passing across 220+ source files; mypy strict 0/0; ruff clean. MCP CIMD richness deferred to v0.8.4 (4th cycle-deferral; gated on empirical operator demand) + DFAHarness check_faithfulness=True wiring deferred to v0.8.4 polish.

v0.8.2 (May 2026)Review-deferral closure + supply-chain hardening + test-quality + DFAH faithfulness. Aggressive ~3-week cycle executed in a single focused session. Closes 8 reservations carried out of v0.8.1. F-V81-S1: evidentia mcp serve --allow-root <path> gates file-path tool inputs via validate_within. F-V81-S2: AuthProvider construction moved from import-time module-level → FastAPI lifespan event; imports are side-effect-free. G4 Dockerfile foundation: docker/requirements.txt regenerated against the v0.8.2 dep tree with SHA256 hashes per transitive (activation deferred to v0.8.3 per §25.6 R1 build-determinism). G1 mutmut + G2 hypothesis: mutation-testing baseline + 8 property-based tests on normalizer + crosswalk. DFAH faithfulness scoring: second arXiv 2601.15322 metric via stdlib Jaccard token-overlap (threshold 0.3 default). First-class Sigstore signing for evidentia eval: --sign / --no-sign flag + evidentia eval verify subcommand. Ninth consecutive PROCEED-CLEAN of the v0.7.x → v0.8.x line. 2277 tests passing across ~215 source files; mypy strict 0/0; ruff clean. CIMD richness + sentence-transformers faithfulness + DFAH calibration corpus deferred to v0.8.3.

v0.8.1 (May 2026) — *Review-deferral close-out + LLM richness

  • network surfaces*. Aggressive ~4-week cycle compressed to a single focused session. Closes ALL 12 v0.8.0-bucketed review findings (2 HIGH + 4 MEDIUM + 6 LOW polish + 2 INFO). Ships the LLM-driven richness for the v0.8.0 P0 surfaces: evidentia eval risk-determinism --context X --gaps Y runs the DFAHarness against the live RiskStatementGenerator; PRT LLM-driven per-claim decomposition replaces the v0.8.0 stub (trace_kind=v0.8.1-llm vs =v0.8.0-stub audit-log discriminator). Network surfaces: evidentia mcp serve --transport sse|http + FastAPI AuthProvider middleware (evidentia serve --auth-token-file <path>). Closes the v0.8.0 F-V08-S3 /api/metrics auth gate. Eighth consecutive PROCEED-CLEAN of the v0.7.x → v0.8.x line. Three Phase 4 infra primitives (G4 Dockerfile --require-hashes, G1 mutmut, G2 hypothesis) deferred to v0.8.2 per §24.6 R6. 2240 tests passing across 211 source files; mypy strict 0/0; ruff clean.

v0.8.0 (May 2026)The OSS-native AI moat. First minor after the v0.7.x cycle close. Lands four AI-quality features that distinguish a Vanta-class dashboard from a compliance- engineering tool: DFAH determinism harness (evidentia eval stub-smoke — auditor-defensible numerical proof that AI artifact generation is reproducible per arXiv 2601.15322), Policy Reasoning Traces (evidentia risk generate --emit-trace — decomposes risk statements into ordered claims with policy clause citations per arXiv 2509.23291), MCP server (evidentia mcp serve — exposes Evidentia to MCP- aware AI clients over stdio with 4 read-only tools), and plugin contract scaffolding (evidentia_core.plugins — 4 ABCs + 3 reference implementations + entry-point discovery for community catalog providers + SI-partner extensions). M-4 collector base-class refactor consolidates ~60% of the HTTP scaffolding across the 4 vendor-risk collectors. New /api/metrics Prometheus endpoint + docs/evidence-integrity.md operator deployment guidance. Pre-release-review v4 Pre-tag PROCEED-CLEAN with 5 inline-fixes from the parallel security

  • code-quality reviews; 12 findings bucketed to v0.8.1 with documented rationale. 2227 tests passing across 210 source files; mypy strict 0/0; ruff clean.

v0.7.16 (May 2026)Final v0.7.x cycle release. Closes the v0.7.x cycle (18 patches + 2 hot-fixes over ~12 days; 6 consecutive PROCEED-CLEAN). python-dotenv CVE bump

  • commit-msg hook variant of standing-rule sweep + post- ship release.yml hardening validation. v0.8.0 design phase opens immediately post-ship.

v0.7.15 (May 2026)Tailwind 4 + SettingsPage refactor + standing-rule pre-commit hook. Final v0.7.x cycle release before v0.8.0 design opens. Tailwind 3→4 migration (full shadcn/ui preset rewrite to CSS-first @theme {}; PostCSS chain replaced with @tailwindcss/vite plugin; tailwindcss-animate v3-era → tw-animate-css v4-compatible). SettingsPage.tsx refactored to key-based remount of <SettingsForm/> sub-component; lint rule react-hooks/set-state-in-effect promoted from warn to error. New scripts/standing_rule_sweep.sh + pre-commit hook runs the canonical 21-pattern guard at commit-time. Fifth consecutive PROCEED-CLEAN /security-review. Post-ship hardening (commit fd36e78) extends release.yml publish-container Wait step to all 6 packages — closes the LAST PyPI propagation race surface. 2120 tests passing across 188 source files.

v0.7.14 (May 2026)frontend modernization + Codecov P2.1 RESOLVED + final v0.7.x hygiene + v0.8.0 G4 foundation. 7 of 8 PR #21 frontend major bumps (TypeScript 5→6, ESLint 9→10 flat-config, plugin-react-hooks 5→7, jsdom + minors; tailwind 3→4 deferred to v0.7.15). 3 deferred v0.7.8 LOWs closed (Tableau Windows tempfile via TemporaryDirectory, Databricks LTS env-var, test-coverage gaps). Codecov dashboard fixed (was 0% since v0.7.10; now 82.14% via removing the flag_management.individual_flags[].paths glob that filtered all files out). container-build Wait extended to all 6 packages. Hash-pinned docker/requirements.txt preview lands as v0.8.0 G4 reproducible-build foundation. Fourth consecutive PROCEED- CLEAN /security-review. 2120 tests passing across 188 source files.

v0.7.13 (May 2026) — *dependency modernization + Codecov fix

  • P3 carry-over closures + release-notes hygiene*. Wrap-up of the v0.7.x cycle. No new public surfaces. Codecov coverage upload fixed (switched to source_pkgs so Cobertura XML emits full repo-relative paths). release.yml now auto-populates the GitHub Release body from CHANGELOG.md via a new extract_changelog_block.py step + body_path arg — closes the v0.7.5–v0.7.12 stub-body gap structurally; future releases auto-populate. P3 carry-overs closed: M-9 OSCAL UUID conformance + L-2 Vanta/Drata high-risk extended fields + L-4 SIG BYO sparse-row debug logging + 5 of 9 v0.7.8 LOWs. Third consecutive PROCEED-CLEAN /security-review. Plus 10 historical release-body backfills landed retroactively. 2100 tests passing across 188 source files.

v0.7.12 (May 2026)concrete cloud-WORM backends + FAIR Monte Carlo + GDPR purge-flow + alert-zero. Adds the three cloud backends to the WORMBackend ABC introduced in v0.7.11: S3ObjectLockWORM, AzureImmutableBlobWORM, GCSBucketLockWORM (installed via evidentia[worm-s3] / [worm-azure] / [worm-gcs] extras). Adds FAIR Monte Carlo simulation (risk quantify --method fair-mc) using stdlib-only Beta-PERT sampling. Adds GDPR Article 17 purge-flow (purge_immediately + force_gdpr_purge operator override). Plus 3 cloud-WORM operator runbooks, alert-zero closure (CodeQL custom sanitizer pack registers validate_within as a path-injection sanitizer), bump_version.py inter-package pin tightening, and release-checklist Steps 5.5 + 9.5 doc-consistency + release-notes practices. Second consecutive PROCEED-CLEAN /security-review. 2075 tests passing across 188 source files.

v0.7.11 (May 2026) — *audit chain-of-custody + governance trio

  • Open FAIR + 6-store harmony*. Adds the evidentia retention CLI (set / list / show / extend / transition / delete / report) with 10-regime classification (SEC 17a-4 / FINRA 3110 / IRS / SOX / HIPAA / GLBA / PCI / SR 11-7 / GDPR / generic) and a WORMBackend ABC with a LocalFilesystemWORM reference impl. Adds KRI/KPI/KGI metrics overlay (P1.5 G3), Open FAIR risk quantification (P1.5 G4 deterministic PERT-mean), and process-as-code governance workflows (P1.5 G5). Closes 9 of 17 v0.7.10 P3 deferrals including F-V10-S2 ($EDITOR allowlist). First v0.7.x PROCEED-CLEAN /security-review (0 findings). 1929 tests passing across 184 source files.

v0.7.9 + v0.7.10 (May 2026)industry overlay (financial services TPRM + model risk + governance primitives) + federal- compliance carry-overs. v0.7.9 ships evidentia tprm (vendor inventory + DD-questionnaire generator with 5 output formats including SIG BYO + caiq-full + concentration-report) + 4 vendor-risk SaaS collectors (Vanta / Drata / BitSight / SecurityScorecard) + OSCAL TPRM emit. v0.7.10 adds the model- risk overlay (SR 11-7 / OCC 2011-12 model inventory + validation report templates) + 7 new bundled catalogs (5 FFIEC IT Handbook booklets + OCC 2011-12 / FRB SR 11-7 + FFIEC CAT). Bundled catalog count: 82 → 89.

v0.7.8 (May 2026)cloud data-warehouse collectors + BI integrations. Adds two read-only evidence collectors for cloud data warehouses (Databricks workspace API + Snowflake account_usage views; mapped to NIST 800-53 controls AC-2 / AC-3 / AC-6 / AC-7 / AU-2 / AU-3 / IA-2 / IA-5 / SC-7 / SC-12 / SI-2) plus the first output integrations to enterprise BI platforms (Tableau Server / Cloud + Power BI). Three published datasets per BI platform: gap inventory, NIST SP 800-30 risk register with AI- provenance fields, and CollectionContext audit trail. Ships walkthrough docs (docs/cloud-dw-collectors.md, docs/bi-integrations.md) and an end-to-end Meridian-with-BI demo.

v0.7.7 (May 2026)SQL family evidence collectors. Five read-only relational-DB adapters ([sql-postgres], [sql-mysql], [sql-sqlite], [sql-mssql], [sql-oracle]) mapping DB-resident compliance evidence to NIST 800-53 controls. Plus ServiceNow output integration carry-forward.

See CHANGELOG.md for the full version history (v0.1.0 through v0.9.9). For forward direction, see docs/v0.8.0-plan.md (the OSS-native AI moat — DFAH + PRT + MCP + plugin contracts) and docs/ROADMAP.md (everything else).

What works today

  • Gap analysis against 89 bundled frameworks across four redistribution tiers:

    • Tier A — US federal (25 frameworks, verbatim public domain): NIST 800-53 Moderate sample, 800-171 Rev 2/Rev 3, 800-172, CSF 2.0, AI RMF 1.0, Privacy Framework 1.0, SSDF 800-218; FedRAMP Rev 5 Low/Moderate/High/LI-SaaS baselines; CMMC 2.0 Levels 1/2/3; HIPAA Security/Privacy/Breach Notification Rules; GLBA Safeguards, NY DFS 500, NERC CIP v7, FDA 21 CFR Part 11, IRS 1075, CMS ARS, FBI CJIS v6, CISA Cross-Sector CPGs.

    • Tier A — International (6 frameworks): UK NCSC CAF 3.2, UK Cyber Essentials, Australian Essential Eight, Australian ISM, Canada ITSG-33, New Zealand NZISM.

    • Tier D — Statutory obligations (21 frameworks, government edicts, uncopyrightable): EU GDPR, EU AI Act, EU NIS2, EU DORA, UK DPA 2018, Canada PIPEDA, plus all 15 comprehensive US state privacy laws (CA CCPA/CPRA, VA, CO, CT, UT, TX, OR, DE, MT, IA, FL, TN, NH, MD, MN).

    • Tier C — Licensed stubs (20 frameworks): ISO/IEC 27001:2022, 27002:2022, 27017, 27018, 27701, 42001 (AI), 22301 (BC); PCI DSS v4.0.1; HITRUST CSF v11; COBIT 2019; SWIFT CSCF 2024; CIS Controls v8.1 plus 5 CIS Benchmarks (AWS, Azure, GCP, Kubernetes, RHEL 9); Secure Controls Framework 2024; IEC 62443; SOC 2 TSC. Copyrighted authoritative text isn't bundled — ships with public clause numbering plus a evidentia catalog import hook for your licensed copy.

    • Tier B — Threat and vulnerability catalogs (4 frameworks): MITRE ATT&CK Enterprise (41 techniques), MITRE CWE Top 25 (2024), MITRE CAPEC sample, CISA KEV sample (Log4Shell, MOVEit, EternalBlue, etc).

  • Six bundled crosswalks: NIST CSF 2.0 → 800-53, FedRAMP Moderate → CMMC L2, NIST 800-53 → HIPAA Security, ISO 27001 → NIST 800-53, VCDPA → CCPA/CPRA, NIST 800-53 → SOC 2 TSC.

  • Multi-format inventory parsing. Load your controls from YAML, CSV, JSON (including OSCAL component-definition), or any format with fuzzy-matched column headers. Status normalization handles "implemented", "partial", "planned", "in progress", "missing", etc.

  • Cross-framework crosswalk engine. Bidirectional mapping index: ask "what NIST 800-53 controls satisfy my SOC 2 criterion?" or "what CMMC Level 2 controls match my FedRAMP Moderate posture?". v0.2.0 ships six bundled crosswalks (118 mappings total). Custom crosswalks are drop-in JSON files in catalogs/data/mappings/.

  • Prioritized gap reports. Severity by implementation state, effort-weighted priority scores, efficiency opportunities (controls that close gaps in 2+ frameworks simultaneously), and a prioritized remediation roadmap.

  • Five output formats: JSON (canonical), CSV (flat), Markdown (human review), OSCAL Assessment Results (for audit handoff and tool interop), and SARIF 2.1.0 (--format sarif, v0.10.0 — runs gap analysis as a CI gate, surfaced in GitHub code scanning and GitLab security dashboards).

  • AI risk statement generator. NIST SP 800-30 Rev 1 compliant risk statements from gaps + system context. Uses Instructor to enforce the RiskStatement Pydantic schema on LLM output, with automatic retries on validation failure. Works with any LiteLLM-supported model.

  • Typer + Rich CLI with init, catalog list/show/crosswalk/import/ where/license-info/remove, gap analyze, gap diff, risk generate, explain control, collect aws, collect github, integrations jira, oscal verify (v0.7.0 — verifies SHA-256 digests + GPG .asc and/or Sigstore .sigstore.json signatures), serve (web UI), doctor, and version commands. catalog list supports --tier and --category filters; catalog import accepts direct JSON or an OSCAL profile (via --profile <profile.json> --catalog <source.json>). Global flags: --offline (air-gap mode), --json-logs (ECS 8.11 structured output for SIEM ingestion), --config <path>, --verbose, --quiet.

  • 3292 tests passing + 17 environmental skips (verified locally on Windows; full suite runs on Linux CI per the v0.9.9 baseline) covering models, catalog loading (with a parametric smoke test per bundled framework), recursive enhancement flattener for NIST Rev 5 3-level IDs, tier invariants, OSCAL profile resolution, user-import directory precedence, crosswalk bidirectionality, multi-format inventory parsing, severity calculation, all five report exporters (incl. OSCAL Assessment Results + SARIF 2.1.0), Jira + ServiceNow + Tableau + Power BI push/sync, the full evidence-collector surface (AWS Config

    • Security Hub + IAM Access Analyzer + GitHub branch protection + CODEOWNERS + Dependabot + Okta + Vanta + Drata + BitSight + SecurityScorecard + Databricks + Snowflake + SQL family), FastAPI /api/* endpoints, air-gap mode, OSCAL AR digest + GPG + Sigstore round-trip verification, POA&M state-machine transitions, CONMON cadence enforcement (incl. the CONMON daemon), DFAH faithfulness scoring (Jaccard + sentence-transformers semantic + LLM atomic-claim extraction), multi-tenant RBAC, WORM evidence-store invariants, MCP CIMD scope gating, OCSF mapping, the federal-SI walk-through scenarios, Hypothesis property tests, and 3 trestle conformance tests against the NIST OSCAL reference impl.

What's not yet included (as of v0.9.9)

Setting expectations matters. v0.7.x → v0.9.x has been a steady cadence of enterprise hardening (SLSA L3 build provenance + cosign keyless signing + PEP 740 attestations + OpenSSF Scorecard / OSV gates + Silver-tier badge), AI features (DFAH faithfulness scoring, AI governance, MCP CIMD scope enforcement), federal compliance (POA&M state machine + CONMON cadences + CONMON daemon + WORM evidence store + multi-tenant RBAC + OSCAL 1.2.1), and integration-surface expansion (Okta, Vanta, Drata, ServiceNow, BitSight, SecurityScorecard, Databricks, Snowflake, SQL-family, Tableau, Power BI). v0.10.0 added the OCSF-aligned findings schema

  • SARIF 2.1.0 CI-gate export. See CHANGELOG.md and docs/ROADMAP.md for the full per-release history. The following are still on the roadmap but not yet shipped:
  • Multi-modal LLM evidence validation (Phase 3 / v1.x) — "is this screenshot actually proof of MFA?" scoring, freshness detection, multi-modal validation via Document Screenshot Embedding (DSE). DFAH textual-claim faithfulness scoring shipped in v0.8.2–v0.8.6 (Jaccard + sentence-transformers semantic + LLM atomic-claim extraction + Cohen's Kappa rater agreement); DSE itself remains academic-only and is tracked in docs/positioning-and-value.md §13.
  • Additional cloud collectors — Azure + GCP evidence collectors. Carried forward as optional / community-driven items.
  • Authoritative control text for copyrighted frameworks (ISO 27001/27002, SOC 2 TSC, PCI DSS, HITRUST CSF, etc.) — ships as Tier-C stubs with public clause numbering only. Use evidentia catalog import to load your own licensed copy. (Design decision, not a roadmap item — the project will never bundle copyrighted text.)

Quick start

Prerequisites

  • Python 3.12+
  • uv 0.4+ (recommended) or pip

Install from PyPI

pip install evidentia

This installs the evidentia and cb CLI commands, plus the five workspace sub-packages as transitive dependencies (evidentia-core, evidentia-ai, evidentia-collectors, evidentia-integrations, evidentia-api).

Install from source (for contributors)

git clone https://github.com/polycentric-labs/evidentia.git
cd evidentia
uv sync --all-packages

This downloads Python 3.12 (if needed), creates a .venv, and installs all 8 workspace packages in editable mode (as of v0.10.5: evidentia-core, evidentia-ai, evidentia-eval, evidentia-collectors, evidentia-integrations, evidentia-api, evidentia-mcp, and the evidentia meta-package).

Run the smoke test

uv run pytest tests/ -q
# Expected: full suite passes in ~10s on a warm checkout

Hit a snag? See docs/troubleshooting.md for common first-run issues — wrong Python version, missing SPA bundle, Sigstore TUF metadata fetch failures, Docker bind-mount permissions.

Want the absolute shortest path? See docs/quickstart.md — five commands from pip install to a verified OSCAL Assessment Results document.

Web UI flows (v0.7.6 alpha.2)

evidentia serve brings up a FastAPI + React SPA on http://127.0.0.1:8000. Five interactive surfaces ship today, mirroring the CLI 1:1:

Page What it does
Home Three-path onboarding (sample data / upload / wizard)
Frameworks Browse all 89 bundled catalogs with tier + category filters
Gap Analyze Form + framework picker → TanStack Table results
Gap Diff Two-report classification + PR-comment markdown export
Risk Generate Streamed AI risk statements per gap

See docs/gui/README.md for a per-page walkthrough + accessibility notes + troubleshooting.

End-to-end walkthrough with sample data

Evidentia ships with a realistic fictional fintech scenario in examples/meridian-fintech/. Walk through it in five steps:

# 1. Verify installation
uv run evidentia doctor

# 2. Explore available frameworks
uv run evidentia catalog list

# 3. Inspect a specific control
uv run evidentia catalog show nist-800-53-mod --control SI-4

# 4. See how one framework maps to another
uv run evidentia catalog crosswalk \
  --source nist-800-53-mod --target soc2-tsc --control AC-2

# 5. Run gap analysis on the Meridian Financial sample inventory
cd examples/meridian-fintech
uv --project ../.. run evidentia gap analyze \
  --inventory my-controls.yaml \
  --frameworks nist-800-53-mod,soc2-tsc \
  --output report.md --format markdown \
  --min-efficiency-frameworks 2

Expected output: a 17-gap report against 28 required controls, 39.3% coverage, 11 critical / 5 high / 1 medium severities, with the top of the priority queue dominated by monitoring/detection gaps (CC7.1, CC7.2, SI-4, AU-6).

Use as a GitHub Action

v0.7.0 ships a composite GitHub Action that turns every PR into a compliance check. It runs evidentia gap analyze, diffs against the main-branch baseline, posts a sticky PR comment with the diff, and gates merge on regressions.

# .github/workflows/compliance.yml
name: Compliance check
on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

permissions:
  contents: read
  pull-requests: write

jobs:
  compliance:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
        with: { fetch-depth: 2 }

      - uses: polycentric-labs/evidentia/.github/actions/gap-analysis@v0
        with:
          inventory: inventory.yaml
          frameworks: nist-800-53-rev5-moderate,soc2-tsc
          github-token: ${{ secrets.GITHUB_TOKEN }}

See .github/actions/gap-analysis/README.md for the full input/output surface, OSCAL AR + Sigstore signing options, SHA-pinned variants for audit pipelines, and the migration guide from the legacy standalone allenfbyrd/evidentia-action@v1 (now archived).

Generate AI risk statements

Requires an LLM API key. Any LiteLLM-supported provider works:

export OPENAI_API_KEY=sk-...            # or ANTHROPIC_API_KEY, etc.

uv --project ../.. run evidentia risk generate \
  --context system-context.yaml \
  --gaps report.json \
  --model gpt-4o \
  --output risks.json \
  --limit 5

This produces five validated RiskStatement objects (NIST SP 800-30 structure) for the five highest-priority gaps.

Starting your own project

# From an empty directory
uv run evidentia init

# Creates:
#   evidentia.yaml       — config with defaults
#   my-controls.yaml         — template control inventory
#   system-context.yaml      — template system context
#   .evidentia/          — local storage

Edit my-controls.yaml with your real inventory and run evidentia gap analyze.


Architecture

Evidentia is a uv workspace monorepo of six composable Python packages plus a React/Vite frontend workspace:

Package Role
evidentia-core Pydantic data models, OSCAL catalog loader, crosswalk engine, gap analyzer
evidentia-ai LiteLLM + Instructor client, risk statement generator, control explainer
evidentia-collectors Evidence collection agents — AWS (Config + Security Hub), GitHub (branch protection + CODEOWNERS), Okta (SSO + MFA), Databricks (Unity Catalog + clusters), Snowflake (LOGIN_HISTORY + grants + masking), 5 SQL adapters (Postgres / MySQL / SQLite / MSSQL / Oracle), 4 vendor-risk APIs (Vanta / Drata / BitSight / SecurityScorecard)
evidentia-integrations Jira push + bidirectional status sync, ServiceNow ticket sync, Tableau + Power BI publish
evidentia-api FastAPI server (26 REST routes across 12 router modules) that bundles the React SPA for evidentia serve
evidentia CLI meta-package: Typer/Rich entry points (evidentia + cb alias)
evidentia-ui (non-Python) Vite + React 18 + shadcn/ui frontend; built bundle is copied into evidentia-api at wheel time

The 6 v0.5.1 controlbridge-* deprecation shims published in v0.6.0 were removed at v0.7.0 per the public migration contract. Existing v0.5.1 installs continue to work; future releases no longer produce shim wheels.

Data flow

┌─────────────────┐   ┌─────────────────┐   ┌────────────────────┐
│ my-controls.yaml│   │  OSCAL catalogs │   │ framework mappings │
│       .csv      │   │  (77 bundled;   │   │    (crosswalks)    │
│       .json     │   │  manifest-driven) │   └──────────┬───────┘
└────────┬────────┘   └────────┬────────┘              │
         │                     ▼                       ▼
         │           ┌──────────────────────────────────────┐
         └──────────▶│         GapAnalyzer                  │
                     │  normalize → match → score → rank    │
                     └──────────────────┬───────────────────┘
                                        │
                     ┌──────────────────┴───────────────────┐
                     │                                      │
                     ▼                                      ▼
         ┌──────────────────────┐              ┌──────────────────────┐
         │  GapAnalysisReport   │              │   RiskStatementGen   │
         │  (JSON/CSV/MD/OSCAL) │              │   (NIST SP 800-30)   │
         └──────────────────────┘              └──────────┬───────────┘
                                                          │
                                                          ▼
                                              ┌──────────────────────┐
                                              │  LiteLLM+Instructor  │
                                              │  (any LLM provider)  │
                                              └──────────────────────┘

Key design decisions

  • Pydantic v2 everywhere with ConfigDict(use_enum_values=True, extra="forbid", str_strip_whitespace=True). Structured data, strict validation, JSON-roundtripping for free.
  • OSCAL as the lingua franca. Inputs parse OSCAL catalogs and component-definitions. Outputs include OSCAL Assessment Results. Your data is portable.
  • Instructor for AI structured output. LLMs return raw text; Instructor enforces a Pydantic schema and automatically retries on validation failure. No regex parsing of LLM output.
  • Hatchling build backend with [tool.hatch.build.targets.wheel] packages = ["src/..."] and [dependency-groups] dev = [...] (the modern uv spec, not the deprecated [tool.uv] dev-dependencies).

Roadmap

See docs/ROADMAP.md for the detailed version-level plan. Summary below.

Phase 1 — MVP (v0.1.0 – v0.2.1) — SHIPPED

  • Core data models
  • OSCAL catalog loader + crosswalk engine
  • Multi-format inventory parser
  • Gap analyzer with priority scoring
  • Report exporters (JSON/CSV/Markdown/OSCAL-AR)
  • AI risk statement generator
  • CLI (init, catalog, gap, risk, doctor)
  • Sample data + end-to-end walkthrough
  • Phase 1.5 (v0.2.0 big-bang): exhaustive framework expansion — full upstream NIST 800-53 Rev 5 OSCAL (~1189 controls + Low/Mod/High/Privacy baselines), NIST 800-171 r2/r3, 800-172, CSF 2.0, AI RMF, SSDF, Privacy Framework; FedRAMP Rev 5 baselines; CMMC 2.0 L1/L2/L3; CJIS, CISA CPGs, HIPAA, GLBA, NY DFS 500, NERC CIP, FDA 21 CFR Pt 11, IRS 1075, CMS ARS; EU GDPR/AI Act/NIS2/DORA, UK NCSC CAF, Essential Eight, ACSC ISM, Canada ITSG-33/PIPEDA, NZISM; 15 US state privacy laws; Tier-C stubs for ISO 27001/27002/27017/27018/27701/42001/22301/9001, SOC 2 TSC, PCI DSS 4.0, HITRUST, COBIT, SWIFT CSCF, CIS Controls + Benchmarks, SCF, IEC 62443; MITRE ATT&CK, CWE, CAPEC, CISA KEV; evidentia catalog import for user-licensed Tier-C content; GitHub Actions refresh CI for upstream change detection.

Compliance-as-code (v0.3.x) — SHIPPED

  • evidentia gap diff — classify gaps opened / closed / severity-changed / unchanged
  • --fail-on-regression for CI integration
  • evidentia explain <control_id> — LLM-generated plain-English control translations
  • Three realistic example scenarios (Meridian fintech, Acme Healthtech, Northstar DoD)

Accessible GRC (v0.4.x) — SHIPPED

  • FastAPI REST server (evidentia serve) — 26 /api/* routes across 12 router modules
  • React + Vite + shadcn/ui web UI (WCAG 2.1 AA via Radix primitives)
  • Air-gapped mode (--offline flag + doctor --check-air-gap validator)
  • Reusable GitHub Action (allenfbyrd/evidentia-action@v1)

Phase 2 — Evidence Collection (v0.5.0) — SHIPPED

  • Base collector architecture with check_connection(), collect(), get_supported_controls()
  • AWS collector — Config rules + Security Hub (FSBP / CIS)
  • GitHub collector — branch protection, CODEOWNERS, visibility
  • Jira integration — push gaps as issues + bidirectional status sync

Rename release (v0.6.0) — SHIPPED

  • ControlBridge → Evidentia across code, PyPI, GitHub, docs
  • v0.5.1 deprecation shims for the six old PyPI names

Enterprise-grade release (v0.7.0) — SHIPPED

  • SHA-256 digest per evidence item in OSCAL AR exports
  • Optional GPG signing of the AR document (air-gap path)
  • Sigstore/Rekor signing of the AR (online path, OIDC-based)
  • CycloneDX SBOM on every release
  • PyPI Trusted Publisher (OIDC) + PEP 740 attestations
  • OSCAL conformance via compliance-trestle round-trip in CI
  • AWS IAM Access Analyzer + GitHub Dependabot collectors
  • ECS-8.11 / NIST-AU-3 / OpenTelemetry structured logs
  • Consolidated GitHub Action at .github/actions/gap-analysis/
  • Tamper-evident audit trail for external-auditor review

AI features hardening (v0.7.1) — SHIPPED

  • GenerationContext metadata on every AI-generated artifact (sibling of CollectionContext)
  • 9 new evidentia.ai.* EventAction entries for ECS-structured AI audit events
  • Typed exception hierarchy in evidentia_ai.exceptions — closes BLOCKER B3 for risk_statements/ + explain/
  • Bounded retry via with_retry_async + build_retrying/build_async_retrying against shared LLM_TRANSIENT_EXCEPTIONS set
  • run_id-correlated audit trails across AI generated/failed/retry/cache_hit/batch_completed events
  • Best-effort operator identity via evidentia_ai.client.get_operator_identity() — closes NIST AU-3 "Identity" gap for AI artifacts
  • 116+ net new tests across test_ai/, test_audit/, test_models/

Supply-chain polish + documentation refresh (v0.7.2) — SHIPPED

  • OpenSSF Scorecard weekly workflow (.github/workflows/scorecard.yml) publishing to securityscorecards.dev
  • Cursor + VS Code workspace config (.vscode/{4 files} + .cursorrules + .editorconfig) for testing/validation inline
  • docs/ide-setup.md walkthrough — pytest discovery, mypy strict, ruff format-on-save, coverage gutters, 7 debug configs, 16 pre-canned tasks
  • Catalog-drift detector fix — pinned yaml.safe_dump(width=200) for byte-stable manifest emit + --ignore-all-space workflow guard (closes issues #1-#4)
  • Pre-release-review refinements — 4 MEDIUM doc/config polish fixes (DORA past-tense, doc stamp date, Windows venv path, regen stderr warning)

Later — quality signals + more integrations (v0.7.x+)

  • Risk-statement quality validator (NIST SP 800-30 / IR 8286 scoring + auto-regeneration)
  • Additional collectors — IAM Access Analyzer, Dependabot, Okta, Azure, GCP
  • Additional integrations — ServiceNow, Vanta, Drata
  • Compliance ROI scoring ("close N gaps across M frameworks with one remediation")
  • Auto-generated TypeScript types from FastAPI OpenAPI schema
  • Tauri desktop packaging for offline-first users

Phase 3 — AI Evidence Validation (later)

  • Evidence-to-control relevance scoring (is this screenshot actually proof of MFA?)
  • Freshness / staleness detection per framework (SOC 2 = 90 days, NIST = 365)
  • Multi-modal validation (PDFs, screenshots, log exports, JSON)
  • Coverage heatmaps

Platform — network deployment (later)

  • Multi-user auth / RBAC for network deployments (localhost-only today)
  • Multi-tenant database backend (PostgreSQL)
  • Cost analytics (LLM spend per control / per framework)

Phase 5 — Ecosystem

  • Plugin system for custom collectors
  • OSCAL catalog marketplace / community contributions
  • Integration with policy-as-code tools (OPA, Cedar)
  • Terraform provider for compliance-as-code

See Evidentia-Architecture-and-Implementation-Plan.md for the full canonical plan (~318 KB) including all code sketches, data flows, and technology rationales.


Development

Project layout

Evidentia/
├── packages/
│   ├── evidentia-core/         # Pydantic models, catalogs, gap analyzer
│   ├── evidentia-ai/           # LiteLLM client, risk generator, explain
│   ├── evidentia-collectors/   # AWS (Config + Security Hub), GitHub
│   ├── evidentia-integrations/ # Jira (push + sync)
│   ├── evidentia-api/          # FastAPI REST server + bundled SPA
│   ├── evidentia/              # CLI meta-package (Typer entry points)
│   └── evidentia-ui/           # Vite + React + shadcn/ui frontend
├── tests/
│   ├── fixtures/                   # Sample inventories + recorded fixtures
│   ├── unit/                       # Unit tests (per-package subtrees)
│   └── integration/                # CLI + examples smoke tests
├── examples/
│   ├── meridian-fintech/           # Realistic fintech walkthrough
│   ├── acme-healthtech/            # HIPAA-focused scenario
│   └── northstar-systems/          # DoD / CMMC scenario
├── docs/
│   ├── ROADMAP.md                  # Version-level plan
│   ├── air-gapped.md               # `--offline` mode guide
│   ├── architecture/               # Deep-dive docs
│   ├── github-action/              # Reusable action docs
│   └── gui/                        # Web UI guide
├── .github/
│   ├── workflows/test.yml          # CI: pytest matrix + ruff + mypy
│   ├── workflows/release.yml       # Auto-release on main-branch deploys
│   └── ISSUE_TEMPLATE/             # Bug report / feature request
└── pyproject.toml                  # uv workspace root

Run tests

uv run pytest tests/ -q                       # All tests
uv run pytest tests/unit/ -v                  # Unit tests with verbose output
uv run pytest tests/unit/test_gap_analyzer/   # One subpackage

Add a new framework catalog

  1. Drop an OSCAL catalog JSON file in packages/evidentia-core/src/evidentia_core/catalogs/data/<framework-id>.json.
  2. Register its metadata in catalogs/registry.py under FRAMEWORK_METADATA.
  3. Optionally add crosswalks in catalogs/data/mappings/.
  4. Run evidentia catalog list — your framework should appear.

Code style

  • Python 3.12+ syntax: str | None, list[str], from datetime import UTC
  • from __future__ import annotations at the top of every module
  • Ruff + mypy (configured in pyproject.toml)

Contributing

Phases 1, 1.5, 2 (Jira + AWS + GitHub), and Accessible GRC (v0.4.x web UI

  • air-gap mode) are shipped. High-value contribution areas:
  • Additional crosswalks — especially ISO 27001 ↔ NIST 800-53 and PCI DSS ↔ SOC 2
  • Queued collectors — IAM Access Analyzer, Dependabot, Okta, Azure, GCP
  • Queued integrations — ServiceNow, Vanta, Drata
  • Evidence chain of custody (v0.7.0) — SHA-256 digests + GPG signing of OSCAL AR exports
  • Risk-statement quality validation — NIST SP 800-30 / IR 8286 scoring of AI output
  • Production OSCAL catalogs — drop-in JSON files from upstream sources
  • Test coverage — edge cases in CSV header matching, OSCAL parsing, and air-gap guard

Security

Please do not open a public GitHub issue for security concerns. See SECURITY.md for the disclosure process — GitHub Private Vulnerability Reporting is the preferred channel; email is documented as a backup. The policy also covers the supported-version table, scope, disclosure timeline, and supply-chain provenance verification.

Every release ships with cryptographic provenance: PEP 740 attestations on every wheel + sdist (Sigstore + Rekor), CycloneDX 1.6 SBOM attached to each GitHub Release. Verification command in SECURITY.md.


License

Apache License 2.0


Acknowledgments

Evidentia stands on the shoulders of excellent open-source projects:

  • NIST OSCAL — the structured data standard that makes framework interop possible
  • Pydantic — type-safe data models without the boilerplate
  • LiteLLM — unified LLM access across every provider
  • Instructor — structured output extraction from LLMs
  • Typer and Rich — the CLI is only as good as the framework
  • uv — Python packaging that finally feels modern

AI assistance

This project was developed alongside AI platforms.

Models used: Claude Opus 4.6, Claude Opus 4.7, Sonar Deep Research

About

Open-source Python GRC tool: gap analysis, AI risk statements, OSCAL-first compliance automation. Enterprise-grade evidence integrity (Sigstore + GPG), CycloneDX SBOM, PyPI Trusted Publisher OIDC + PEP 740 attestations.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors