privacy-filter

Stateless FastAPI microservice that detects personally identifiable information (PII) in plain text and returns the structured detection spans. The detection backend is the HuggingFace token-classification model openai/privacy-filter, accessed through a pluggable Detector interface so the model can be swapped without API changes.

V1 is intentionally narrow: plain text in, detection spans out. No redaction, no file ingestion, no async job queue. Callers act on the spans however they need to.

Quickstart

uv venv
uv sync --extra dev                 # API + tests, no torch
# Or, to run the real detector locally:
uv sync --extra dev --extra hf      # adds transformers + torch (multi-GB)

cp .env.example .env                # then edit API_KEYS
uv run uvicorn privacy_filter.api.app:create_app --factory --reload

API

`POST /v1/detect`

Header: X-API-Key: <one of API_KEYS> (required)

Request body:

{ "text": "Email me at alice@example.com tomorrow." }

Response 200:

{
  "detections": [
    {
      "entity_group": "private_email",
      "score": 0.99,
      "word": "alice@example.com",
      "start": 12,
      "end": 29
    }
  ],
  "model": "openai/privacy-filter",
  "model_revision": "<resolved HF commit SHA>"
}

entity_group is one of: account_number, private_address, private_email, private_person, private_phone, private_url, private_date, secret. start/end are character offsets in the original input (Python slice semantics: text[start:end] == word).

`GET /healthz`

Liveness check. No auth. Returns 200 {"status": "ok"} whenever the process is responsive.

`GET /readyz`

Readiness check. No auth. Returns 200 {"status": "ready"} only after the detector has loaded and warmed up. Returns 503 otherwise.

Errors

All non-2xx responses use a uniform envelope:

{ "error": { "code": "string", "message": "string", "request_id": "uuid" } }

Codes: unauthorized, payload_too_large, invalid_request, not_ready, internal_error. Error responses never include exception messages or stack traces.

Configuration

All settings come from environment variables. See .env.example.

Var	Required	Default
`API_KEYS`	yes	—
`MAX_INPUT_CHARS`	no	`50000`
`MAX_BODY_BYTES`	no	`262144`
`MODEL_ID`	no	`openai/privacy-filter`
`MODEL_REVISION`	no	(resolved at load time)
`LOG_LEVEL`	no	`INFO`
`CORS_ORIGINS`	no	(empty; CORS disabled)

API_KEYS is comma-separated. CORS stays disabled unless CORS_ORIGINS is set.

Development

uv run pytest                                  # fast tests (FakeDetector)
uv run pytest -m slow                          # exercises the real HF model (requires --extra hf)
uv run pytest tests/unit/test_bioes.py -v      # one test file
uv run pytest tests/unit/test_bioes.py::test_BIE_run_yields_one_span_with_min_score -v   # one test
uv run ruff check
uv run ruff format
uv run mypy
uv run pip-audit
uv run pre-commit run --all-files

The fast suite uses a FakeDetector injected via create_app(detector_factory=...), so it runs in milliseconds and does not require transformers or torch. Slow tests live under tests/slow/ and are gated by the slow pytest marker.

Docker

Build and run locally:

docker build -t privacy-filter .
docker run --rm -p 8000:8000 -e API_KEYS=changeme privacy-filter

The image runs as a non-root user (uid 1001), includes a HEALTHCHECK against /healthz, and bakes the HF model weights at build time so cold starts and air-gapped deployments do not depend on outbound network. Build args MODEL_ID and MODEL_REVISION (defaulting to openai/privacy-filter at the resolved HEAD) control which weights ship in the image. The runtime sets TRANSFORMERS_OFFLINE=1 and HF_HUB_OFFLINE=1 so an accidentally-misconfigured runtime cannot reach the Hub. Torch is pinned to the CPU-only PyTorch wheel (pytorch-cpu index) so the image does not carry CUDA libs.

Published images

Tagged releases are built and pushed to GHCR by .github/workflows/release.yml. Push a SemVer tag (vMAJOR.MINOR.PATCH) and the workflow publishes:

ghcr.io/open-source-legal/privacy-filter:1.2.3
ghcr.io/open-source-legal/privacy-filter:1.2
ghcr.io/open-source-legal/privacy-filter:1
ghcr.io/open-source-legal/privacy-filter:latest (only for non-prerelease tags)

Cutting a release:

git tag v0.1.0
git push origin v0.1.0

Pulling:

docker pull ghcr.io/open-source-legal/privacy-filter:latest

Architecture

The detector boundary is the central abstraction. All inference is mediated by privacy_filter.detection.protocol.Detector, a Protocol with model_id, model_revision, and detect(text) -> list[Detection]. Routes never construct or import a concrete detector — they receive it through api/deps.py::get_detector, which reads it from app.state after the lifespan populates it. Integration tests inject FakeDetector; production uses HuggingFaceDetector (lazy-imports torch).

Logging uses an allowlist processor: only operational metadata fields (request_id, endpoint, method, status, latency_ms, input_chars, detection_count, plus error metadata) are emitted; everything else is dropped before the JSON renderer sees it. The integration test tests/integration/test_logging_no_pii.py is the mechanical guard.

The full design contract is in docs/superpowers/specs/2026-05-05-privacy-filter-microservice-design.md.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
docker		docker
docs/superpowers		docs/superpowers
src/privacy_filter		src/privacy_filter
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

privacy-filter

Quickstart

API

`POST /v1/detect`

`GET /healthz`

`GET /readyz`

Errors

Configuration

Development

Docker

Published images

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

privacy-filter

Quickstart

API

POST /v1/detect

GET /healthz

GET /readyz

Errors

Configuration

Development

Docker

Published images

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/detect`

`GET /healthz`

`GET /readyz`

Packages