Skip to content

GabrielRamirezDev/eCFR-Metrics-App

Repository files navigation

eCFR Metrics App

Local-first analytics for turning public eCFR data into agency-level burden, change, and auditability signals.

This project ingests agency mappings and dated eCFR XML snapshots, normalizes them into SQLite, computes transparent text-based metrics, and exposes the results through a FastAPI API and a React dashboard. The goal is not to replace legal analysis; it is to make large regulatory text corpora faster to explore by highlighting where restrictive language, process burden, and recent change activity are concentrated.

Overview screenshot

Highlights

  • Public eCFR ingestion, normalization, and agency-to-CFR mapping
  • Agency-level prioritization metrics with transparent formulas
  • Reliability hardening, audit tooling, and reproducible local data

What The App Does

The app builds a local analytical view of eCFR content by agency. It flattens the agency directory, deduplicates CFR references, fetches current and dated XML snapshots, extracts scoped text blocks, and computes agency-level metrics such as word count, restrictive-term density, process-term density, change volatility, and content fingerprints. A small web UI then surfaces those metrics in an overview table, per-agency drilldown, and methodology page.

Screenshots

Overview

The ranked overview highlights agencies with high burden and active change patterns.

Overview

Agency Detail: Current Metrics

The agency detail page explains the scores in plain language and shows the current burden metrics.

Agency detail top

Agency Detail: Observed History

The history table exposes the underlying snapshots, change flags, and fingerprints behind each agency row.

Agency detail bottom

Methodology: Metrics And Formulas

The methodology page publishes the formulas, caveats, and weighted term lists used by the app.

Methodology top

Methodology: Caveats And Trust Notes

The lower sections call out confidence limits, overlap caveats, and how to interpret the metrics responsibly.

Methodology bottom

More screenshots and captions: docs/SCREENSHOTS.md

Architecture And Data Flow

At a high level:

  1. Fetch the eCFR agency directory and title metadata.
  2. Flatten the nested agency tree and normalize CFR references.
  3. Plan whole-title or part-level XML fetches for each snapshot date.
  4. Parse XML into scoped text blocks.
  5. Match blocks to agency references and aggregate metrics into SQLite.
  6. Serve ranked and per-agency views through FastAPI.
  7. Render the API in a React dashboard.
  8. Validate the data with smoke checks, audits, and repair scripts.

Architecture notes and a rendered diagram live in docs/ARCHITECTURE.md.

Key Metrics

  • Review Priority (0–100): a triage rank combining current burden and change volatility.
  • Restrictive Terms / 1k Words: weighted restrictive-language hits normalized by text size.
  • Process Terms / 1k Words: weighted paperwork/process-language hits normalized by text size.
  • Change Volatility: combines how often, how much, and how quickly the observed snapshots changed.
  • Content Fingerprint: a stable SHA-256 checksum used for auditability and change detection.

These metrics are intentionally transparent. They are heuristics for prioritization, not legal conclusions or official burden estimates.

Data Quality And Reliability Challenges Solved

  • Agency mapping overlap: agency references come from the eCFR reader aid and can overlap. The pipeline normalizes reference tuples and stores them separately from current/history tables so the mapping layer stays inspectable.
  • Duplicate CFR references: duplicate (title, subtitle, chapter, subchapter, part) mappings are removed before persistence and reinforced by SQLite keys.
  • Unreliable dated XML fetches: historical fetches fail closed rather than silently publishing partial history windows.
  • Ambiguous change interpretation: fingerprints, stored history points, and plain-language trust notes make it clear when the app is showing observed snapshots rather than amendment-by-amendment legal diffs.
  • Current-row drift: audit and repair scripts can recompute current metrics from stored history and flag mismatches.

Tech Stack

  • Backend: Python 3.11, FastAPI, SQLite, HTTPX
  • Frontend: React 18, TypeScript, Vite
  • Testing: Pytest, Vitest
  • Tooling: Ruff, Make, Docker Compose

Local Setup

Backend

cd backend
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cd ..
make run-backend

Frontend

Open a second terminal:

cd frontend
npm install --cache ../.npm-cache
npm run dev

Open The App

  • Frontend: http://127.0.0.1:5173
  • Backend OpenAPI docs: http://127.0.0.1:8000/docs

The repository includes a bundled validated SQLite seed at backend/data/ecfr_insights.sqlite3, so the initial local run does not require a live reseed.

Test And Verification Commands

From the repository root:

make test
make refresh-current-metrics
make audit-metrics
make audit-agencies
make snapshot
make smoke

Optional Docker verification:

docker compose up --build
make smoke-docker

Optional Live Ingest Flow

When you want to rebuild the local dataset from live eCFR sources:

cd backend
source .venv/bin/activate
python -m app.ingestion --profile quick
python -m app.ingestion --profile standard
python -m app.ingestion --profile deep

If you only need to recompute current metrics from stored history without refetching XML:

make refresh-current-metrics
make audit-metrics

Limitations

  • This is optimized for a reproducible local workflow, not hosted multi-user deployment.
  • The eCFR XML service can be unreliable for dated full-title snapshots, so the bundled local seed is the most predictable way to explore the project.
  • Agency mappings come from the eCFR reader aid and can overlap.
  • The metrics are prioritization heuristics, not legal or economic burden estimates.
  • Observed history is based on bounded snapshots, not a full legal redline engine.

Future Improvements

  • Broader snapshot coverage with a more resilient fetch/cache strategy
  • Better treatment of overlapping agency mappings and shared text ownership
  • More expressive linguistic metrics beyond weighted term lists
  • A hosted demo environment with automated data refreshes

AI-Assisted Workflow Note

AI tooling was used selectively for scaffolding, iteration speed, and documentation support. The metric design, reliability decisions, data-shape validation, audit logic, and final project framing were verified through direct code review, tests, and local execution.

Additional Docs

About

Local-first eCFR analytics dashboard for agency-level burden, change volatility, and audit-ready text metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors