🛡️ Fraud Detection

A runnable, container-native reference implementation of the kind of 7-stage layered fraud-detection pipeline that runs at most card networks, neobanks, and payment processors.
Built to be read top-to-bottom as a learning project — every stage is one short Python module.

flowchart TD
    A(["① Transaction arrives"]) --> B{"② Rules engine"}
    B -- "hard block" --> Z(["⛔ BLOCK"])
    B -- "soft signals" --> C["③ Fast GBT score"]
    C --> D{"score in #91;0.2, 0.8#93;?"}
    D -- "no" --> F["⑥ Aggregate risk"]
    D -- "yes" --> E1["④a Encoder<br/>→ embedding"]
    E1 --> E2["④b Enriched GBT"]
    E2 --> G["⑤ Graph analysis<br/><i>optional</i>"]
    G --> F
    F --> H{"⑦ Decision"}
    H -- "risk < 0.35" --> Ap(["✅ APPROVE"])
    H -- "0.35 ≤ risk < 0.75" --> Ch(["⚠️ CHALLENGE"])
    H -- "risk ≥ 0.75" --> Bl(["⛔ BLOCK"])

    classDef stop fill:#fee,stroke:#c33,stroke-width:2px,color:#900
    classDef ok fill:#efe,stroke:#3a3,stroke-width:2px,color:#262
    classDef warn fill:#fef6e0,stroke:#c90,stroke-width:2px,color:#640
    class Z,Bl stop
    class Ap ok
    class Ch warn

✨ Features

Every stage maps to one Python module under src/fraud_detection/. The cascade short-circuits as soon as a decision is certain, so the cheap stages catch the easy cases and only the borderline ones pay for the heavy path.

🛡️ Stage 2 — Rules Engine

Deterministic guardrails that run before any model — auditable, instant, and free.

Rule	Severity	What it catches
MERCH_BLOCK	hard block	Merchant on blocklist
COUNTRY_BLOCK	hard block	Sanctioned country
COUNTRY_RISK	0.6	High-risk country (NG / RO / UA by default)
HIGH_AMOUNT	0.4 → 1.0	Amount above configurable threshold
VELOCITY	0.8	≥ 5 transactions in 5 minutes
GEO_JUMP	0.7	Country change in < 1 hour
NEW_CUSTOMER_HIGH_VALUE	0.5	First-ever online txn with a large amount

Individual rule probabilities are combined via noisy-OR: 1 − Π(1 − sᵢ) — so two independent moderate signals compound the way a human would expect.

⚡ Stage 3 — Fast Gradient-Boosted Trees

Shallow, sub-millisecond model that scores every transaction.

HistGradientBoostingClassifier, depth 4, 120 iterations
~25 engineered features: amount, time-of-day, velocity windows, behavioural deltas, label-encoded categoricals
Class imbalance handled via class_weight={0: 1, 1: neg/pos}

💡 Why HistGradientBoostingClassifier and not XGBoost? Same algorithm family (histogram-based GBT), but it ships purely as a Python wheel — no libomp / OpenMP runtime required, so uv sync is genuinely the only install step on every OS. The TrainedModel interface is identical, so XGBoost / LightGBM are drop-in swaps.

🧠 Stage 4 — Encoder + Enriched GBT

The heavy path — only fires when stage 3 lands in the uncertain band (default [0.20, 0.80]).

Sub-stage	What it does
4a Autoencoder	Small PyTorch MLP (`F → 32 → 8 → 32 → F`) trained unsupervised on every transaction. The 8-d bottleneck activation is the embedding.
4b Enriched GBT	Deeper booster (depth 6, 300 iterations) trained on `[engineered features ⨁ embedding]` — picks up interactions a single tree split can't.

🕸️ Stage 5 — Graph Analysis

Heterogeneous entity graph (card ↔ device ↔ merchant ↔ ip) catches relational patterns that row-wise models miss.

Signal	What it indicates
Cluster size	Shared infrastructure across many entities
Fraud-neighbour count	Past confirmed fraud in the same connected component
Cards per device / per IP	Classic mule signatures
Devices per card	Indicates a compromised card cycling through hardware

In production you'd swap networkx for a real graph store (Neo4j, AlloyDB Omni, TigerGraph) or precomputed GNN embeddings — the EntityGraph.score() interface is the contract.

⚖️ Stage 6 + 7 — Risk Aggregation & Decision

Confidence-weighted blend over whichever stages ran, mapped to one of three actions.

$$\text{risk} = \frac{\sum_s w_s \cdot c_s \cdot \text{score}_s}{\sum_s w_s \cdot c_s}$$

where $w_s$ is the per-stage weight, $c_s$ the self-reported confidence, and the sum runs over stages that actually ran.

Risk score	Decision	Customer experience
`< 0.35`	✅ `APPROVE`	Transaction goes through
`0.35 – 0.75`	⚠️ `CHALLENGE`	Step-up auth: OTP / 3DS / review
`≥ 0.75`	⛔ `BLOCK`	Declined, customer notified
Hard-block rule	⛔ `BLOCK`	Overrides everything, always declined

Confidence for model stages is 1 − 2·min(p, 1−p) — a model that says 0.5 contributes almost nothing; one that says 0.95 carries weight.

🎯 Why a 7-stage pipeline?

Property	What this design gives you
⚡ Speed-vs-accuracy cascade	Rules + fast GBT score every txn in < 1 ms. Only the ambiguous fraction pays for the encoder + larger model, keeping P99 latency cheap.
🛡️ Defence in depth	Rules catch known patterns. Models catch learned patterns. Graphs catch relational patterns (mule rings, shared devices) neither rules nor models see.
🔍 Explainability	A hard-block by rule is auditable end-to-end. A model challenge is a probability plus the top contributing features.
🔌 Modular	Each stage is one file with one public class. Swap any of them without touching the others.

🛠️ Tech Stack

Category	Technology
Language	Python 3.13
Dependency manager	uv + hatchling (build backend)
Numerics	NumPy
Gradient boosting	scikit-learn HistGradientBoostingClassifier
Autoencoder	PyTorch (CPU-only wheels)
Entity graph	NetworkX
HTTP service	FastAPI + Uvicorn
Configuration	pydantic-settings
Container base image	`ghcr.io/astral-sh/uv:python3.13-trixie-slim` (Debian 13)
Model persistence	joblib
Testing & lint	pytest + ruff

🚀 Quick Start

Prerequisites

uv — the only thing you need. uv fetches the exact Python version pinned in .python-version (3.13) on first run, fully isolated from any system Python.

# Install uv (one of)
pipx install uv
# or:  brew install uv
# or:  curl -LsSf https://astral.sh/uv/install.sh | sh

Run it

# Clone + enter
git clone https://github.com/sumitsahoo/fraud-detection.git
cd fraud-detection

# Fetch Python 3.13, create .venv, install the package + all deps
uv sync

# 1. Generate ~30k synthetic transactions (~4-5% fraud)
uv run fraud-generate --n 30000

# 2. Train encoders + fast GBT + autoencoder + enriched GBT
uv run fraud-train

# 3. Run four illustrative scenarios end-to-end
uv run fraud-demo

# 4. (optional) Launch the HTTP service
uv run fraud-serve

Expected output (decisions are deterministic; exact risk scores vary by seed):

SCENARIO: normal coffee shop purchase
  -> DECISION: APPROVE   risk=0.250   hard_block=False
  stages that ran:
    - rules          score=0.000  conf=0.30
    - fast_model     score=0.000  conf=1.00
    - graph          score=0.700  conf=1.00

SCENARIO: card-testing burst (7th small txn in <5 min)
  -> DECISION: BLOCK     risk=0.857   hard_block=False
  reasons:
    • fast model probability 1.000
    • VELOCITY: 6 txns in last 300s

SCENARIO: transaction in sanctioned country
  -> DECISION: BLOCK     risk=1.000   hard_block=True
  reasons:
    • sanctioned country XX

SCENARIO: suspected account takeover
  -> DECISION: BLOCK     risk=0.800   hard_block=False
  reasons:
    • fast model probability 1.000
    • COUNTRY_RISK: high-risk country NG

Available commands

Command	Description
`uv sync`	Fetch Python 3.13, create `.venv`, install all deps + the package
`uv run fraud-generate --n 30000`	Create ~30k synthetic transactions in `artifacts/transactions.csv`
`uv run fraud-train`	Train all four artifacts (encoders, fast GBT, autoencoder, enriched GBT)
`uv run fraud-demo`	Run the four illustrative scenarios through the pipeline
`uv run fraud-serve`	Launch the FastAPI service (default `:8000`)
`uv run pytest`	Run the test suite
`uv run ruff check src tests`	Lint
`uv lock --upgrade`	Bump every dep to the newest compatible version

🐳 Docker

The bundled Dockerfile builds on top of ghcr.io/astral-sh/uv:python3.13-trixie-slim, generates synthetic data, trains all four artifacts, then launches the FastAPI service — so the resulting image boots up ready to serve.

docker build -t fraud-detection .
docker run --rm -p 8000:8000 fraud-detection

Property	Value
Base image	`ghcr.io/astral-sh/uv:python3.13-trixie-slim` (Debian 13, Python 3.13, uv)
Image size	~1.75 GB (CPU-only PyTorch — see `[tool.uv.sources]` in `pyproject.toml`)
Exposed port	`8000`
Healthcheck	Built-in via `HEALTHCHECK` directive hitting `/health`
Entrypoint	`fraud-serve --host 0.0.0.0 --port 8000`

Production notes

Training inside the image keeps the demo simple but isn't how you'd ship for real. In production you'd train offline, push artifacts to blob storage (S3 / GCS), and have the container fetch them on startup. The service already reads from $FRAUD_ARTIFACTS_DIR (default artifacts/) — point that at a mounted volume.
The bundled training CSV (artifacts/transactions.csv) is used at startup to populate the graph. Replace it with a snapshot from your actual transaction store, or refactor service._build_pipeline to load a precomputed graph pickle.
The container runs as root and uvicorn binds 0.0.0.0:8000. For a hardened deploy add a non-root USER line in the Dockerfile and front the service with a reverse proxy that handles TLS + auth.

🌐 HTTP API

The FastAPI service (api/app.py + api/routes.py) exposes two routes. Interactive docs at /docs (Swagger) and /redoc.

`GET /health`

curl http://localhost:8000/health
# {"status":"ok","pipeline_loaded":true,"artifacts_dir":"artifacts"}

`POST /score`

curl -X POST http://localhost:8000/score \
     -H 'Content-Type: application/json' \
     -d @- <<'JSON'
{
  "transaction": {
    "txn_id": "t-1", "timestamp": "2025-06-01T10:00:00",
    "customer_id": "c1", "card_id": "card1", "merchant_id": "m1",
    "merchant_category": "5942", "merchant_country": "XX",
    "customer_country": "US", "amount": 9999.0, "currency": "USD",
    "device_id": "d1", "ip_address": "203.0.113.42", "channel": "online"
  },
  "history": []
}
JSON

Response:

{
  "decision": "block",
  "risk_score": 1.0,
  "hard_block": true,
  "stages": [
    { "name": "rules", "score": 1.0, "confidence": 1.0 }
  ],
  "reasons": ["sanctioned country XX"]
}

Run the service locally (without Docker)

uv run fraud-serve --port 8000
# Requires that you've already run `uv run fraud-train` so artifacts/
# contains the trained models.
# Or, run uvicorn directly:
#   uv run uvicorn fraud_detection.api.app:app --port 8000

📁 Project Structure

fraud-detection/
├── README.md
├── LICENSE
├── CODE_OF_CONDUCT.md         # Contributor Covenant 3.0
├── CONTRIBUTING.md            # Dev setup, workflow, PR checklist
├── SECURITY.md                # Disclosure process + threat model
├── pyproject.toml             # Hatchling-built package + uv-managed deps
├── uv.lock                    # Pinned, reproducible resolution
├── .python-version            # Pinned Python 3.13
├── .env.example               # All FRAUD_* env vars + defaults
├── Dockerfile                 # Containerises the service
├── .dockerignore
├── .gitignore
├── docs/
│   └── guide.md               # Detailed implementation walkthrough
├── src/fraud_detection/
│   ├── __init__.py            # Lazy-export public API
│   ├── config.py              # pydantic-settings, env-driven
│   ├── logging_config.py      # JSON / text structured logging
│   ├── schema.py              # Transaction & StageResult dataclasses
│   ├── rules.py               # Stage 2 — deterministic rules engine
│   ├── features.py            # Numeric + categorical feature engineering
│   ├── graph.py               # Stage 5 — NetworkX entity graph
│   ├── aggregator.py          # Stages 6 & 7 — fusion + decision
│   ├── pipeline.py            # Orchestrator that wires it all up
│   ├── models/                # ML models
│   │   ├── __init__.py
│   │   ├── boosting.py        # Stages 3 & 4b — fast + enriched GBTs
│   │   └── encoder.py         # Stage 4a — PyTorch autoencoder + EmbeddingService
│   ├── api/                   # HTTP service
│   │   ├── __init__.py
│   │   ├── app.py             # create_app() FastAPI factory + lifespan
│   │   ├── routes.py          # GET /health, POST /score
│   │   ├── schemas.py         # Pydantic request/response models
│   │   └── dependencies.py    # build_pipeline() + DI provider
│   └── cli/                   # Console entry points
│       ├── __init__.py
│       ├── generate_data.py   # `fraud-generate`
│       ├── train.py           # `fraud-train`
│       ├── demo.py            # `fraud-demo`
│       └── serve.py           # `fraud-serve` (uvicorn launcher)
├── tests/                     # pytest — fixtures + unit + e2e (pipeline-stubbed)
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_rules.py
│   ├── test_features.py
│   ├── test_aggregator.py
│   ├── test_graph.py
│   └── test_pipeline.py
└── artifacts/                 # gitignored — generated data + trained models

🧰 Console scripts

After uv sync the project is installed in editable mode and four commands appear on PATH (inside uv run):

Command	What it does
`fraud-generate`	Create synthetic transactions with injected fraud patterns
`fraud-train`	Train encoders + autoencoder + both GBT models
`fraud-demo`	Score four illustrative scenarios end-to-end
`fraud-serve`	Launch the FastAPI service (uvicorn wrapper)

📚 Documentation

The deep technical walkthrough lives in docs/guide.md — every stage, every algorithm choice, the math behind aggregation, the production gaps. ~700 lines, read top-to-bottom or jump in via its table of contents.

What you want	Where to look
60-second overview	this README
What each stage does	✨ Features above
Run it locally / in Docker	🚀 Quick Start / 🐳 Docker
Why noisy-OR? Why class-weight? Why an autoencoder?	📚 Implementation Guide
API request / response shapes	🌐 HTTP API

🔍 Stage-by-stage deep dive

1️⃣ Transaction arrives

A Transaction is a plain dataclass (schema.py):

Transaction(
    txn_id="...", timestamp=..., customer_id="...", card_id="...",
    merchant_id="...", merchant_category="5812", merchant_country="US",
    customer_country="US", amount=42.10, currency="USD",
    device_id="...", ip_address="...", channel="online",
)

Adapt to whatever your message bus delivers — Kafka, Pub/Sub, an HTTP webhook from the acquirer, etc.

2️⃣ Rules engine (`rules.py`)

Two roles in one layer:

Hard blocks that must be auditable — sanctions, blocked merchants, regulatory floors. These bypass the rest of the pipeline.
Cheap signals — velocity, geo jumps, amount thresholds — that flow into the aggregator alongside model scores.

The rules shipped here are intentionally illustrative — most production systems have dozens to hundreds, often expressed as a DSL or as Drools / OpenL Tablets.

3️⃣ Fast GBT (`models/boosting.py`)

The feature set in features.py:

Transaction-level: amount, log_amount, hour, dow, is_night, …
Velocity windows: txn counts at 5 m / 1 h / 24 h / 7 d
Behavioural deltas: secs since last txn, amount-vs-30-d-average, first-time-merchant flag, country mismatch
Categorical: merchant category / country / channel / currency (label-encoded with a stable vocab)

Why not just one big model? Most transactions are obvious — boring groceries, recurring subscriptions, a thousand identical coffee purchases. A shallow model nails those at < 1 ms. Only the grey-zone transactions deserve the heavier path.

4️⃣ Encoder + enriched GBT (`models/encoder.py`)

The autoencoder learns a smooth representation of transaction behaviour. The 8-d bottleneck captures "what kind of transaction is this, behaviourally?" — clusters of similar fraud rings whose features look unremarkable individually.

In a real system you might extend this with:

Sequence embeddings of the last N transactions of this card.
Merchant embeddings (a separate model trained on merchant → category).
Identity-graph node embeddings (see stage 5).

5️⃣ Graph analysis (`graph.py`)

The graph is read-only at scoring time — you only add the txn after the final decision, so the candidate doesn't bias its own score.

g = EntityGraph()
g.build_from(historical_transactions)
signal = g.score(candidate_txn)
# GraphSignal(cluster_size=5, fraud_neighbors=4, ...)

6️⃣ + 7️⃣ Aggregation and decision (`aggregator.py`)

Tune the thresholds per channel / amount band / customer segment in production. The defaults here are sensible starting points — real values come from a precision/recall sweep against a business-defined cost matrix.

🧪 Programmatic Use

from fraud_detection import FraudPipeline, Transaction
# ... build pipeline as in fraud_detection/cli/demo.py ...

result = pipeline.score(txn, history=customer_history)

print(result.decision)     # Decision.APPROVE | CHALLENGE | BLOCK
print(result.risk_score)   # 0.0 .. 1.0
for s in result.stages:    # which stages actually ran
    print(s.name, s.score, s.confidence)
for r in result.reasons:   # top human-readable reasons
    print(r)

🧩 Extending it

Extension	How
More rules	Drop additions into `RulesEngine.evaluate`. Keep them cheap — that's the whole point of stage 2.
Different stage-3 model	Anything with `predict_proba` works. XGBoost / LightGBM / a logistic regression / TabNet are all drop-in.
Better embeddings	Replace `Autoencoder` with a transformer over the customer's last 50 transactions, or a pretrained merchant model.
Real graph database	Swap `EntityGraph` for a Neo4j / AlloyDB Omni / TigerGraph client — the `score()` interface is the contract.
Online / streaming	The pipeline is pure-function given `(txn, history)`. Plug it behind Kafka Streams, Beam, or any RPC framework.

⚠️ Known Limitations

🎲 Synthetic data isn't real card flows. Distributions, seasonality and the fraud:legit ratio are all toy values. Real fraud is closer to 0.1–0.5%, not the 4–5% here.
🕰️ Random train/test split. Production should be temporal (out-of-time holdout) and out-of-customer (no leakage from a customer appearing in both splits).
📉 No drift monitoring, calibration, or feedback loop from analyst review — all of which matter as much as the model architecture.
🎚️ Thresholds are hand-tuned. In production they come from a precision/recall sweep against a business-defined cost matrix.

🤝 Contributing

Contributions are welcome! See the Contributing Guide for setup, the development workflow, and the PR checklist.

This project follows the Contributor Covenant 3.0 Code of Conduct.

🔒 Security

If you discover a security vulnerability, please do not open a public issue — see the Security Policy for the private disclosure process and the project's security model.

📄 License

This project is licensed under the MIT License — feel free to use it for both personal and commercial purposes. See the LICENSE file for details.

Built with ❤️ by Sumit Sahoo

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
docs		docs
src/fraud_detection		src/fraud_detection
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🛡️ Fraud Detection

✨ Features

🛡️ Stage 2 — Rules Engine

⚡ Stage 3 — Fast Gradient-Boosted Trees

🧠 Stage 4 — Encoder + Enriched GBT

🕸️ Stage 5 — Graph Analysis

⚖️ Stage 6 + 7 — Risk Aggregation & Decision

🎯 Why a 7-stage pipeline?

🛠️ Tech Stack

🚀 Quick Start

Prerequisites

Run it

Available commands

🐳 Docker

Production notes

🌐 HTTP API

GET /health

POST /score

Run the service locally (without Docker)

📁 Project Structure

🧰 Console scripts

📚 Documentation

🔍 Stage-by-stage deep dive

1️⃣ Transaction arrives

2️⃣ Rules engine (rules.py)

3️⃣ Fast GBT (models/boosting.py)

4️⃣ Encoder + enriched GBT (models/encoder.py)

5️⃣ Graph analysis (graph.py)

6️⃣ + 7️⃣ Aggregation and decision (aggregator.py)

🧪 Programmatic Use

🧩 Extending it

⚠️ Known Limitations

🤝 Contributing

🔒 Security

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`POST /score`

2️⃣ Rules engine (`rules.py`)

3️⃣ Fast GBT (`models/boosting.py`)

4️⃣ Encoder + enriched GBT (`models/encoder.py`)

5️⃣ Graph analysis (`graph.py`)

6️⃣ + 7️⃣ Aggregation and decision (`aggregator.py`)

Packages