Skip to content

sumitsahoo/fraud-detection

🛡️ Fraud Detection

A runnable, container-native reference implementation of the kind of 7-stage layered fraud-detection pipeline that runs at most card networks, neobanks, and payment processors.
Built to be read top-to-bottom as a learning project — every stage is one short Python module.

Python 3.13 uv-managed Docker ready MIT License

flowchart TD
    A(["① Transaction arrives"]) --> B{"② Rules engine"}
    B -- "hard block" --> Z(["⛔ BLOCK"])
    B -- "soft signals" --> C["③ Fast GBT score"]
    C --> D{"score in #91;0.2, 0.8#93;?"}
    D -- "no" --> F["⑥ Aggregate risk"]
    D -- "yes" --> E1["④a Encoder<br/>→ embedding"]
    E1 --> E2["④b Enriched GBT"]
    E2 --> G["⑤ Graph analysis<br/><i>optional</i>"]
    G --> F
    F --> H{"⑦ Decision"}
    H -- "risk < 0.35" --> Ap(["✅ APPROVE"])
    H -- "0.35 ≤ risk < 0.75" --> Ch(["⚠️ CHALLENGE"])
    H -- "risk ≥ 0.75" --> Bl(["⛔ BLOCK"])

    classDef stop fill:#fee,stroke:#c33,stroke-width:2px,color:#900
    classDef ok fill:#efe,stroke:#3a3,stroke-width:2px,color:#262
    classDef warn fill:#fef6e0,stroke:#c90,stroke-width:2px,color:#640
    class Z,Bl stop
    class Ap ok
    class Ch warn
Loading

✨ Features

Every stage maps to one Python module under src/fraud_detection/. The cascade short-circuits as soon as a decision is certain, so the cheap stages catch the easy cases and only the borderline ones pay for the heavy path.

🛡️ Stage 2 — Rules Engine

Deterministic guardrails that run before any model — auditable, instant, and free.

Rule Severity What it catches
MERCH_BLOCK hard block Merchant on blocklist
COUNTRY_BLOCK hard block Sanctioned country
COUNTRY_RISK 0.6 High-risk country (NG / RO / UA by default)
HIGH_AMOUNT 0.4 → 1.0 Amount above configurable threshold
VELOCITY 0.8 ≥ 5 transactions in 5 minutes
GEO_JUMP 0.7 Country change in < 1 hour
NEW_CUSTOMER_HIGH_VALUE 0.5 First-ever online txn with a large amount

Individual rule probabilities are combined via noisy-OR: 1 − Π(1 − sᵢ) — so two independent moderate signals compound the way a human would expect.

⚡ Stage 3 — Fast Gradient-Boosted Trees

Shallow, sub-millisecond model that scores every transaction.

  • HistGradientBoostingClassifier, depth 4, 120 iterations
  • ~25 engineered features: amount, time-of-day, velocity windows, behavioural deltas, label-encoded categoricals
  • Class imbalance handled via class_weight={0: 1, 1: neg/pos}

💡 Why HistGradientBoostingClassifier and not XGBoost? Same algorithm family (histogram-based GBT), but it ships purely as a Python wheel — no libomp / OpenMP runtime required, so uv sync is genuinely the only install step on every OS. The TrainedModel interface is identical, so XGBoost / LightGBM are drop-in swaps.

🧠 Stage 4 — Encoder + Enriched GBT

The heavy path — only fires when stage 3 lands in the uncertain band (default [0.20, 0.80]).

Sub-stage What it does
4a Autoencoder Small PyTorch MLP (F → 32 → 8 → 32 → F) trained unsupervised on every transaction. The 8-d bottleneck activation is the embedding.
4b Enriched GBT Deeper booster (depth 6, 300 iterations) trained on [engineered features ⨁ embedding] — picks up interactions a single tree split can't.

🕸️ Stage 5 — Graph Analysis

Heterogeneous entity graph (card ↔ device ↔ merchant ↔ ip) catches relational patterns that row-wise models miss.

Signal What it indicates
Cluster size Shared infrastructure across many entities
Fraud-neighbour count Past confirmed fraud in the same connected component
Cards per device / per IP Classic mule signatures
Devices per card Indicates a compromised card cycling through hardware

In production you'd swap networkx for a real graph store (Neo4j, AlloyDB Omni, TigerGraph) or precomputed GNN embeddings — the EntityGraph.score() interface is the contract.

⚖️ Stage 6 + 7 — Risk Aggregation & Decision

Confidence-weighted blend over whichever stages ran, mapped to one of three actions.

$$\text{risk} = \frac{\sum_s w_s \cdot c_s \cdot \text{score}_s}{\sum_s w_s \cdot c_s}$$

where $w_s$ is the per-stage weight, $c_s$ the self-reported confidence, and the sum runs over stages that actually ran.

Risk score Decision Customer experience
< 0.35 APPROVE Transaction goes through
0.35 – 0.75 ⚠️ CHALLENGE Step-up auth: OTP / 3DS / review
≥ 0.75 BLOCK Declined, customer notified
Hard-block rule BLOCK Overrides everything, always declined

Confidence for model stages is 1 − 2·min(p, 1−p) — a model that says 0.5 contributes almost nothing; one that says 0.95 carries weight.


🎯 Why a 7-stage pipeline?

Property What this design gives you
Speed-vs-accuracy cascade Rules + fast GBT score every txn in < 1 ms. Only the ambiguous fraction pays for the encoder + larger model, keeping P99 latency cheap.
🛡️ Defence in depth Rules catch known patterns. Models catch learned patterns. Graphs catch relational patterns (mule rings, shared devices) neither rules nor models see.
🔍 Explainability A hard-block by rule is auditable end-to-end. A model challenge is a probability plus the top contributing features.
🔌 Modular Each stage is one file with one public class. Swap any of them without touching the others.

🛠️ Tech Stack

Category Technology
Language Python 3.13
Dependency manager uv + hatchling (build backend)
Numerics NumPy
Gradient boosting scikit-learn HistGradientBoostingClassifier
Autoencoder PyTorch (CPU-only wheels)
Entity graph NetworkX
HTTP service FastAPI + Uvicorn
Configuration pydantic-settings
Container base image ghcr.io/astral-sh/uv:python3.13-trixie-slim (Debian 13)
Model persistence joblib
Testing & lint pytest + ruff

🚀 Quick Start

Prerequisites

  • uv — the only thing you need. uv fetches the exact Python version pinned in .python-version (3.13) on first run, fully isolated from any system Python.
# Install uv (one of)
pipx install uv
# or:  brew install uv
# or:  curl -LsSf https://astral.sh/uv/install.sh | sh

Run it

# Clone + enter
git clone https://github.com/sumitsahoo/fraud-detection.git
cd fraud-detection

# Fetch Python 3.13, create .venv, install the package + all deps
uv sync

# 1. Generate ~30k synthetic transactions (~4-5% fraud)
uv run fraud-generate --n 30000

# 2. Train encoders + fast GBT + autoencoder + enriched GBT
uv run fraud-train

# 3. Run four illustrative scenarios end-to-end
uv run fraud-demo

# 4. (optional) Launch the HTTP service
uv run fraud-serve

Expected output (decisions are deterministic; exact risk scores vary by seed):

SCENARIO: normal coffee shop purchase
  -> DECISION: APPROVE   risk=0.250   hard_block=False
  stages that ran:
    - rules          score=0.000  conf=0.30
    - fast_model     score=0.000  conf=1.00
    - graph          score=0.700  conf=1.00

SCENARIO: card-testing burst (7th small txn in <5 min)
  -> DECISION: BLOCK     risk=0.857   hard_block=False
  reasons:
    • fast model probability 1.000
    • VELOCITY: 6 txns in last 300s

SCENARIO: transaction in sanctioned country
  -> DECISION: BLOCK     risk=1.000   hard_block=True
  reasons:
    • sanctioned country XX

SCENARIO: suspected account takeover
  -> DECISION: BLOCK     risk=0.800   hard_block=False
  reasons:
    • fast model probability 1.000
    • COUNTRY_RISK: high-risk country NG

Available commands

Command Description
uv sync Fetch Python 3.13, create .venv, install all deps + the package
uv run fraud-generate --n 30000 Create ~30k synthetic transactions in artifacts/transactions.csv
uv run fraud-train Train all four artifacts (encoders, fast GBT, autoencoder, enriched GBT)
uv run fraud-demo Run the four illustrative scenarios through the pipeline
uv run fraud-serve Launch the FastAPI service (default :8000)
uv run pytest Run the test suite
uv run ruff check src tests Lint
uv lock --upgrade Bump every dep to the newest compatible version

🐳 Docker

The bundled Dockerfile builds on top of ghcr.io/astral-sh/uv:python3.13-trixie-slim, generates synthetic data, trains all four artifacts, then launches the FastAPI service — so the resulting image boots up ready to serve.

docker build -t fraud-detection .
docker run --rm -p 8000:8000 fraud-detection
Property Value
Base image ghcr.io/astral-sh/uv:python3.13-trixie-slim (Debian 13, Python 3.13, uv)
Image size ~1.75 GB (CPU-only PyTorch — see [tool.uv.sources] in pyproject.toml)
Exposed port 8000
Healthcheck Built-in via HEALTHCHECK directive hitting /health
Entrypoint fraud-serve --host 0.0.0.0 --port 8000

Production notes

  • Training inside the image keeps the demo simple but isn't how you'd ship for real. In production you'd train offline, push artifacts to blob storage (S3 / GCS), and have the container fetch them on startup. The service already reads from $FRAUD_ARTIFACTS_DIR (default artifacts/) — point that at a mounted volume.
  • The bundled training CSV (artifacts/transactions.csv) is used at startup to populate the graph. Replace it with a snapshot from your actual transaction store, or refactor service._build_pipeline to load a precomputed graph pickle.
  • The container runs as root and uvicorn binds 0.0.0.0:8000. For a hardened deploy add a non-root USER line in the Dockerfile and front the service with a reverse proxy that handles TLS + auth.

🌐 HTTP API

The FastAPI service (api/app.py + api/routes.py) exposes two routes. Interactive docs at /docs (Swagger) and /redoc.

GET /health

curl http://localhost:8000/health
# {"status":"ok","pipeline_loaded":true,"artifacts_dir":"artifacts"}

POST /score

curl -X POST http://localhost:8000/score \
     -H 'Content-Type: application/json' \
     -d @- <<'JSON'
{
  "transaction": {
    "txn_id": "t-1", "timestamp": "2025-06-01T10:00:00",
    "customer_id": "c1", "card_id": "card1", "merchant_id": "m1",
    "merchant_category": "5942", "merchant_country": "XX",
    "customer_country": "US", "amount": 9999.0, "currency": "USD",
    "device_id": "d1", "ip_address": "203.0.113.42", "channel": "online"
  },
  "history": []
}
JSON

Response:

{
  "decision": "block",
  "risk_score": 1.0,
  "hard_block": true,
  "stages": [
    { "name": "rules", "score": 1.0, "confidence": 1.0 }
  ],
  "reasons": ["sanctioned country XX"]
}

Run the service locally (without Docker)

uv run fraud-serve --port 8000
# Requires that you've already run `uv run fraud-train` so artifacts/
# contains the trained models.
# Or, run uvicorn directly:
#   uv run uvicorn fraud_detection.api.app:app --port 8000

📁 Project Structure

fraud-detection/
├── README.md
├── LICENSE
├── CODE_OF_CONDUCT.md         # Contributor Covenant 3.0
├── CONTRIBUTING.md            # Dev setup, workflow, PR checklist
├── SECURITY.md                # Disclosure process + threat model
├── pyproject.toml             # Hatchling-built package + uv-managed deps
├── uv.lock                    # Pinned, reproducible resolution
├── .python-version            # Pinned Python 3.13
├── .env.example               # All FRAUD_* env vars + defaults
├── Dockerfile                 # Containerises the service
├── .dockerignore
├── .gitignore
├── docs/
│   └── guide.md               # Detailed implementation walkthrough
├── src/fraud_detection/
│   ├── __init__.py            # Lazy-export public API
│   ├── config.py              # pydantic-settings, env-driven
│   ├── logging_config.py      # JSON / text structured logging
│   ├── schema.py              # Transaction & StageResult dataclasses
│   ├── rules.py               # Stage 2 — deterministic rules engine
│   ├── features.py            # Numeric + categorical feature engineering
│   ├── graph.py               # Stage 5 — NetworkX entity graph
│   ├── aggregator.py          # Stages 6 & 7 — fusion + decision
│   ├── pipeline.py            # Orchestrator that wires it all up
│   ├── models/                # ML models
│   │   ├── __init__.py
│   │   ├── boosting.py        # Stages 3 & 4b — fast + enriched GBTs
│   │   └── encoder.py         # Stage 4a — PyTorch autoencoder + EmbeddingService
│   ├── api/                   # HTTP service
│   │   ├── __init__.py
│   │   ├── app.py             # create_app() FastAPI factory + lifespan
│   │   ├── routes.py          # GET /health, POST /score
│   │   ├── schemas.py         # Pydantic request/response models
│   │   └── dependencies.py    # build_pipeline() + DI provider
│   └── cli/                   # Console entry points
│       ├── __init__.py
│       ├── generate_data.py   # `fraud-generate`
│       ├── train.py           # `fraud-train`
│       ├── demo.py            # `fraud-demo`
│       └── serve.py           # `fraud-serve` (uvicorn launcher)
├── tests/                     # pytest — fixtures + unit + e2e (pipeline-stubbed)
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_rules.py
│   ├── test_features.py
│   ├── test_aggregator.py
│   ├── test_graph.py
│   └── test_pipeline.py
└── artifacts/                 # gitignored — generated data + trained models

🧰 Console scripts

After uv sync the project is installed in editable mode and four commands appear on PATH (inside uv run):

Command What it does
fraud-generate Create synthetic transactions with injected fraud patterns
fraud-train Train encoders + autoencoder + both GBT models
fraud-demo Score four illustrative scenarios end-to-end
fraud-serve Launch the FastAPI service (uvicorn wrapper)

📚 Documentation

The deep technical walkthrough lives in docs/guide.md — every stage, every algorithm choice, the math behind aggregation, the production gaps. ~700 lines, read top-to-bottom or jump in via its table of contents.

What you want Where to look
60-second overview this README
What each stage does ✨ Features above
Run it locally / in Docker 🚀 Quick Start / 🐳 Docker
Why noisy-OR? Why class-weight? Why an autoencoder? 📚 Implementation Guide
API request / response shapes 🌐 HTTP API

🔍 Stage-by-stage deep dive

1️⃣ Transaction arrives

A Transaction is a plain dataclass (schema.py):

Transaction(
    txn_id="...", timestamp=..., customer_id="...", card_id="...",
    merchant_id="...", merchant_category="5812", merchant_country="US",
    customer_country="US", amount=42.10, currency="USD",
    device_id="...", ip_address="...", channel="online",
)

Adapt to whatever your message bus delivers — Kafka, Pub/Sub, an HTTP webhook from the acquirer, etc.

2️⃣ Rules engine (rules.py)

Two roles in one layer:

  1. Hard blocks that must be auditable — sanctions, blocked merchants, regulatory floors. These bypass the rest of the pipeline.
  2. Cheap signals — velocity, geo jumps, amount thresholds — that flow into the aggregator alongside model scores.

The rules shipped here are intentionally illustrative — most production systems have dozens to hundreds, often expressed as a DSL or as Drools / OpenL Tablets.

3️⃣ Fast GBT (models/boosting.py)

The feature set in features.py:

  • Transaction-level: amount, log_amount, hour, dow, is_night, …
  • Velocity windows: txn counts at 5 m / 1 h / 24 h / 7 d
  • Behavioural deltas: secs since last txn, amount-vs-30-d-average, first-time-merchant flag, country mismatch
  • Categorical: merchant category / country / channel / currency (label-encoded with a stable vocab)

Why not just one big model? Most transactions are obvious — boring groceries, recurring subscriptions, a thousand identical coffee purchases. A shallow model nails those at < 1 ms. Only the grey-zone transactions deserve the heavier path.

4️⃣ Encoder + enriched GBT (models/encoder.py)

The autoencoder learns a smooth representation of transaction behaviour. The 8-d bottleneck captures "what kind of transaction is this, behaviourally?" — clusters of similar fraud rings whose features look unremarkable individually.

In a real system you might extend this with:

  • Sequence embeddings of the last N transactions of this card.
  • Merchant embeddings (a separate model trained on merchant → category).
  • Identity-graph node embeddings (see stage 5).

5️⃣ Graph analysis (graph.py)

The graph is read-only at scoring time — you only add the txn after the final decision, so the candidate doesn't bias its own score.

g = EntityGraph()
g.build_from(historical_transactions)
signal = g.score(candidate_txn)
# GraphSignal(cluster_size=5, fraud_neighbors=4, ...)

6️⃣ + 7️⃣ Aggregation and decision (aggregator.py)

Tune the thresholds per channel / amount band / customer segment in production. The defaults here are sensible starting points — real values come from a precision/recall sweep against a business-defined cost matrix.


🧪 Programmatic Use

from fraud_detection import FraudPipeline, Transaction
# ... build pipeline as in fraud_detection/cli/demo.py ...

result = pipeline.score(txn, history=customer_history)

print(result.decision)     # Decision.APPROVE | CHALLENGE | BLOCK
print(result.risk_score)   # 0.0 .. 1.0
for s in result.stages:    # which stages actually ran
    print(s.name, s.score, s.confidence)
for r in result.reasons:   # top human-readable reasons
    print(r)

🧩 Extending it

Extension How
More rules Drop additions into RulesEngine.evaluate. Keep them cheap — that's the whole point of stage 2.
Different stage-3 model Anything with predict_proba works. XGBoost / LightGBM / a logistic regression / TabNet are all drop-in.
Better embeddings Replace Autoencoder with a transformer over the customer's last 50 transactions, or a pretrained merchant model.
Real graph database Swap EntityGraph for a Neo4j / AlloyDB Omni / TigerGraph client — the score() interface is the contract.
Online / streaming The pipeline is pure-function given (txn, history). Plug it behind Kafka Streams, Beam, or any RPC framework.

⚠️ Known Limitations

  • 🎲 Synthetic data isn't real card flows. Distributions, seasonality and the fraud:legit ratio are all toy values. Real fraud is closer to 0.1–0.5%, not the 4–5% here.
  • 🕰️ Random train/test split. Production should be temporal (out-of-time holdout) and out-of-customer (no leakage from a customer appearing in both splits).
  • 📉 No drift monitoring, calibration, or feedback loop from analyst review — all of which matter as much as the model architecture.
  • 🎚️ Thresholds are hand-tuned. In production they come from a precision/recall sweep against a business-defined cost matrix.

🤝 Contributing

Contributions are welcome! See the Contributing Guide for setup, the development workflow, and the PR checklist.

This project follows the Contributor Covenant 3.0 Code of Conduct.


🔒 Security

If you discover a security vulnerability, please do not open a public issue — see the Security Policy for the private disclosure process and the project's security model.


📄 License

This project is licensed under the MIT License — feel free to use it for both personal and commercial purposes. See the LICENSE file for details.


Built with ❤️ by Sumit Sahoo

About

A runnable, container-native reference implementation of the kind of 7-stage layered fraud-detection pipeline that runs at most card networks, neobanks, and payment processors.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors