eCFR Metrics App

Local-first analytics for turning public eCFR data into agency-level burden, change, and auditability signals.

This project ingests agency mappings and dated eCFR XML snapshots, normalizes them into SQLite, computes transparent text-based metrics, and exposes the results through a FastAPI API and a React dashboard. The goal is not to replace legal analysis; it is to make large regulatory text corpora faster to explore by highlighting where restrictive language, process burden, and recent change activity are concentrated.

Highlights

Public eCFR ingestion, normalization, and agency-to-CFR mapping
Agency-level prioritization metrics with transparent formulas
Reliability hardening, audit tooling, and reproducible local data

What The App Does

The app builds a local analytical view of eCFR content by agency. It flattens the agency directory, deduplicates CFR references, fetches current and dated XML snapshots, extracts scoped text blocks, and computes agency-level metrics such as word count, restrictive-term density, process-term density, change volatility, and content fingerprints. A small web UI then surfaces those metrics in an overview table, per-agency drilldown, and methodology page.

Screenshots

Overview

The ranked overview highlights agencies with high burden and active change patterns.

Agency Detail: Current Metrics

The agency detail page explains the scores in plain language and shows the current burden metrics.

Agency Detail: Observed History

The history table exposes the underlying snapshots, change flags, and fingerprints behind each agency row.

Methodology: Metrics And Formulas

The methodology page publishes the formulas, caveats, and weighted term lists used by the app.

Methodology: Caveats And Trust Notes

The lower sections call out confidence limits, overlap caveats, and how to interpret the metrics responsibly.

More screenshots and captions: docs/SCREENSHOTS.md

Architecture And Data Flow

At a high level:

Fetch the eCFR agency directory and title metadata.
Flatten the nested agency tree and normalize CFR references.
Plan whole-title or part-level XML fetches for each snapshot date.
Parse XML into scoped text blocks.
Match blocks to agency references and aggregate metrics into SQLite.
Serve ranked and per-agency views through FastAPI.
Render the API in a React dashboard.
Validate the data with smoke checks, audits, and repair scripts.

Architecture notes and a rendered diagram live in docs/ARCHITECTURE.md.

Key Metrics

Review Priority (0–100): a triage rank combining current burden and change volatility.
Restrictive Terms / 1k Words: weighted restrictive-language hits normalized by text size.
Process Terms / 1k Words: weighted paperwork/process-language hits normalized by text size.
Change Volatility: combines how often, how much, and how quickly the observed snapshots changed.
Content Fingerprint: a stable SHA-256 checksum used for auditability and change detection.

These metrics are intentionally transparent. They are heuristics for prioritization, not legal conclusions or official burden estimates.

Data Quality And Reliability Challenges Solved

Agency mapping overlap: agency references come from the eCFR reader aid and can overlap. The pipeline normalizes reference tuples and stores them separately from current/history tables so the mapping layer stays inspectable.
Duplicate CFR references: duplicate (title, subtitle, chapter, subchapter, part) mappings are removed before persistence and reinforced by SQLite keys.
Unreliable dated XML fetches: historical fetches fail closed rather than silently publishing partial history windows.
Ambiguous change interpretation: fingerprints, stored history points, and plain-language trust notes make it clear when the app is showing observed snapshots rather than amendment-by-amendment legal diffs.
Current-row drift: audit and repair scripts can recompute current metrics from stored history and flag mismatches.

Tech Stack

Backend: Python 3.11, FastAPI, SQLite, HTTPX
Frontend: React 18, TypeScript, Vite
Testing: Pytest, Vitest
Tooling: Ruff, Make, Docker Compose

Local Setup

Backend

cd backend
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cd ..
make run-backend

Frontend

Open a second terminal:

cd frontend
npm install --cache ../.npm-cache
npm run dev

Open The App

Frontend: http://127.0.0.1:5173
Backend OpenAPI docs: http://127.0.0.1:8000/docs

The repository includes a bundled validated SQLite seed at backend/data/ecfr_insights.sqlite3, so the initial local run does not require a live reseed.

Test And Verification Commands

From the repository root:

make test
make refresh-current-metrics
make audit-metrics
make audit-agencies
make snapshot
make smoke

Optional Docker verification:

docker compose up --build
make smoke-docker

Optional Live Ingest Flow

When you want to rebuild the local dataset from live eCFR sources:

cd backend
source .venv/bin/activate
python -m app.ingestion --profile quick
python -m app.ingestion --profile standard
python -m app.ingestion --profile deep

If you only need to recompute current metrics from stored history without refetching XML:

make refresh-current-metrics
make audit-metrics

Limitations

This is optimized for a reproducible local workflow, not hosted multi-user deployment.
The eCFR XML service can be unreliable for dated full-title snapshots, so the bundled local seed is the most predictable way to explore the project.
Agency mappings come from the eCFR reader aid and can overlap.
The metrics are prioritization heuristics, not legal or economic burden estimates.
Observed history is based on bounded snapshots, not a full legal redline engine.

Future Improvements

Broader snapshot coverage with a more resilient fetch/cache strategy
Better treatment of overlapping agency mappings and shared text ownership
More expressive linguistic metrics beyond weighted term lists
A hosted demo environment with automated data refreshes

AI-Assisted Workflow Note

AI tooling was used selectively for scaffolding, iteration speed, and documentation support. The metric design, reliability decisions, data-shape validation, audit logic, and final project framing were verified through direct code review, tests, and local execution.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
artifacts		artifacts
backend		backend
docs		docs
frontend		frontend
scripts		scripts
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
PUBLIC_POSITIONING.md		PUBLIC_POSITIONING.md
PUBLIC_RELEASE_AUDIT.md		PUBLIC_RELEASE_AUDIT.md
PUBLIC_RELEASE_SUMMARY.md		PUBLIC_RELEASE_SUMMARY.md
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eCFR Metrics App

What The App Does

Screenshots

Overview

Agency Detail: Current Metrics

Agency Detail: Observed History

Methodology: Metrics And Formulas

Methodology: Caveats And Trust Notes

Architecture And Data Flow

Key Metrics

Data Quality And Reliability Challenges Solved

Tech Stack

Local Setup

Backend

Frontend

Open The App

Test And Verification Commands

Optional Live Ingest Flow

Limitations

Future Improvements

AI-Assisted Workflow Note

Additional Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

eCFR Metrics App

What The App Does

Screenshots

Overview

Agency Detail: Current Metrics

Agency Detail: Observed History

Methodology: Metrics And Formulas

Methodology: Caveats And Trust Notes

Architecture And Data Flow

Key Metrics

Data Quality And Reliability Challenges Solved

Tech Stack

Local Setup

Backend

Frontend

Open The App

Test And Verification Commands

Optional Live Ingest Flow

Limitations

Future Improvements

AI-Assisted Workflow Note

Additional Docs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages