A runnable, container-native reference implementation of the kind of 7-stage layered fraud-detection pipeline that runs at most card networks, neobanks, and payment processors.
Built to be read top-to-bottom as a learning project — every stage is one short Python module.
flowchart TD
A(["① Transaction arrives"]) --> B{"② Rules engine"}
B -- "hard block" --> Z(["⛔ BLOCK"])
B -- "soft signals" --> C["③ Fast GBT score"]
C --> D{"score in #91;0.2, 0.8#93;?"}
D -- "no" --> F["⑥ Aggregate risk"]
D -- "yes" --> E1["④a Encoder<br/>→ embedding"]
E1 --> E2["④b Enriched GBT"]
E2 --> G["⑤ Graph analysis<br/><i>optional</i>"]
G --> F
F --> H{"⑦ Decision"}
H -- "risk < 0.35" --> Ap(["✅ APPROVE"])
H -- "0.35 ≤ risk < 0.75" --> Ch(["⚠️ CHALLENGE"])
H -- "risk ≥ 0.75" --> Bl(["⛔ BLOCK"])
classDef stop fill:#fee,stroke:#c33,stroke-width:2px,color:#900
classDef ok fill:#efe,stroke:#3a3,stroke-width:2px,color:#262
classDef warn fill:#fef6e0,stroke:#c90,stroke-width:2px,color:#640
class Z,Bl stop
class Ap ok
class Ch warn
Every stage maps to one Python module under src/fraud_detection/. The cascade short-circuits as soon as a decision is certain, so the cheap stages catch the easy cases and only the borderline ones pay for the heavy path.
Deterministic guardrails that run before any model — auditable, instant, and free.
| Rule | Severity | What it catches |
|---|---|---|
| MERCH_BLOCK | hard block | Merchant on blocklist |
| COUNTRY_BLOCK | hard block | Sanctioned country |
| COUNTRY_RISK | 0.6 | High-risk country (NG / RO / UA by default) |
| HIGH_AMOUNT | 0.4 → 1.0 | Amount above configurable threshold |
| VELOCITY | 0.8 | ≥ 5 transactions in 5 minutes |
| GEO_JUMP | 0.7 | Country change in < 1 hour |
| NEW_CUSTOMER_HIGH_VALUE | 0.5 | First-ever online txn with a large amount |
Individual rule probabilities are combined via noisy-OR: 1 − Π(1 − sᵢ) — so two independent moderate signals compound the way a human would expect.
Shallow, sub-millisecond model that scores every transaction.
HistGradientBoostingClassifier, depth 4, 120 iterations- ~25 engineered features: amount, time-of-day, velocity windows, behavioural deltas, label-encoded categoricals
- Class imbalance handled via
class_weight={0: 1, 1: neg/pos}
💡 Why
HistGradientBoostingClassifierand not XGBoost? Same algorithm family (histogram-based GBT), but it ships purely as a Python wheel — nolibomp/ OpenMP runtime required, souv syncis genuinely the only install step on every OS. TheTrainedModelinterface is identical, so XGBoost / LightGBM are drop-in swaps.
The heavy path — only fires when stage 3 lands in the uncertain band (default [0.20, 0.80]).
| Sub-stage | What it does |
|---|---|
| 4a Autoencoder | Small PyTorch MLP (F → 32 → 8 → 32 → F) trained unsupervised on every transaction. The 8-d bottleneck activation is the embedding. |
| 4b Enriched GBT | Deeper booster (depth 6, 300 iterations) trained on [engineered features ⨁ embedding] — picks up interactions a single tree split can't. |
Heterogeneous entity graph (card ↔ device ↔ merchant ↔ ip) catches relational patterns that row-wise models miss.
| Signal | What it indicates |
|---|---|
| Cluster size | Shared infrastructure across many entities |
| Fraud-neighbour count | Past confirmed fraud in the same connected component |
| Cards per device / per IP | Classic mule signatures |
| Devices per card | Indicates a compromised card cycling through hardware |
In production you'd swap networkx for a real graph store (Neo4j, AlloyDB Omni, TigerGraph) or precomputed GNN embeddings — the EntityGraph.score() interface is the contract.
Confidence-weighted blend over whichever stages ran, mapped to one of three actions.
where
| Risk score | Decision | Customer experience |
|---|---|---|
< 0.35 |
✅ APPROVE |
Transaction goes through |
0.35 – 0.75 |
CHALLENGE |
Step-up auth: OTP / 3DS / review |
≥ 0.75 |
⛔ BLOCK |
Declined, customer notified |
| Hard-block rule | ⛔ BLOCK |
Overrides everything, always declined |
Confidence for model stages is 1 − 2·min(p, 1−p) — a model that says 0.5 contributes almost nothing; one that says 0.95 carries weight.
| Property | What this design gives you |
|---|---|
| ⚡ Speed-vs-accuracy cascade | Rules + fast GBT score every txn in < 1 ms. Only the ambiguous fraction pays for the encoder + larger model, keeping P99 latency cheap. |
| 🛡️ Defence in depth | Rules catch known patterns. Models catch learned patterns. Graphs catch relational patterns (mule rings, shared devices) neither rules nor models see. |
| 🔍 Explainability | A hard-block by rule is auditable end-to-end. A model challenge is a probability plus the top contributing features. |
| 🔌 Modular | Each stage is one file with one public class. Swap any of them without touching the others. |
| Category | Technology |
|---|---|
| Language | Python 3.13 |
| Dependency manager | uv + hatchling (build backend) |
| Numerics | NumPy |
| Gradient boosting | scikit-learn HistGradientBoostingClassifier |
| Autoencoder | PyTorch (CPU-only wheels) |
| Entity graph | NetworkX |
| HTTP service | FastAPI + Uvicorn |
| Configuration | pydantic-settings |
| Container base image | ghcr.io/astral-sh/uv:python3.13-trixie-slim (Debian 13) |
| Model persistence | joblib |
| Testing & lint | pytest + ruff |
- uv — the only thing you need. uv fetches the exact Python version pinned in
.python-version(3.13) on first run, fully isolated from any system Python.
# Install uv (one of)
pipx install uv
# or: brew install uv
# or: curl -LsSf https://astral.sh/uv/install.sh | sh# Clone + enter
git clone https://github.com/sumitsahoo/fraud-detection.git
cd fraud-detection
# Fetch Python 3.13, create .venv, install the package + all deps
uv sync
# 1. Generate ~30k synthetic transactions (~4-5% fraud)
uv run fraud-generate --n 30000
# 2. Train encoders + fast GBT + autoencoder + enriched GBT
uv run fraud-train
# 3. Run four illustrative scenarios end-to-end
uv run fraud-demo
# 4. (optional) Launch the HTTP service
uv run fraud-serveExpected output (decisions are deterministic; exact risk scores vary by seed):
SCENARIO: normal coffee shop purchase
-> DECISION: APPROVE risk=0.250 hard_block=False
stages that ran:
- rules score=0.000 conf=0.30
- fast_model score=0.000 conf=1.00
- graph score=0.700 conf=1.00
SCENARIO: card-testing burst (7th small txn in <5 min)
-> DECISION: BLOCK risk=0.857 hard_block=False
reasons:
• fast model probability 1.000
• VELOCITY: 6 txns in last 300s
SCENARIO: transaction in sanctioned country
-> DECISION: BLOCK risk=1.000 hard_block=True
reasons:
• sanctioned country XX
SCENARIO: suspected account takeover
-> DECISION: BLOCK risk=0.800 hard_block=False
reasons:
• fast model probability 1.000
• COUNTRY_RISK: high-risk country NG
| Command | Description |
|---|---|
uv sync |
Fetch Python 3.13, create .venv, install all deps + the package |
uv run fraud-generate --n 30000 |
Create ~30k synthetic transactions in artifacts/transactions.csv |
uv run fraud-train |
Train all four artifacts (encoders, fast GBT, autoencoder, enriched GBT) |
uv run fraud-demo |
Run the four illustrative scenarios through the pipeline |
uv run fraud-serve |
Launch the FastAPI service (default :8000) |
uv run pytest |
Run the test suite |
uv run ruff check src tests |
Lint |
uv lock --upgrade |
Bump every dep to the newest compatible version |
The bundled Dockerfile builds on top of ghcr.io/astral-sh/uv:python3.13-trixie-slim, generates synthetic data, trains all four artifacts, then launches the FastAPI service — so the resulting image boots up ready to serve.
docker build -t fraud-detection .
docker run --rm -p 8000:8000 fraud-detection| Property | Value |
|---|---|
| Base image | ghcr.io/astral-sh/uv:python3.13-trixie-slim (Debian 13, Python 3.13, uv) |
| Image size | ~1.75 GB (CPU-only PyTorch — see [tool.uv.sources] in pyproject.toml) |
| Exposed port | 8000 |
| Healthcheck | Built-in via HEALTHCHECK directive hitting /health |
| Entrypoint | fraud-serve --host 0.0.0.0 --port 8000 |
- Training inside the image keeps the demo simple but isn't how you'd ship for real. In production you'd train offline, push artifacts to blob storage (S3 / GCS), and have the container fetch them on startup. The service already reads from
$FRAUD_ARTIFACTS_DIR(defaultartifacts/) — point that at a mounted volume. - The bundled training CSV (
artifacts/transactions.csv) is used at startup to populate the graph. Replace it with a snapshot from your actual transaction store, or refactorservice._build_pipelineto load a precomputed graph pickle. - The container runs as root and uvicorn binds
0.0.0.0:8000. For a hardened deploy add a non-rootUSERline in the Dockerfile and front the service with a reverse proxy that handles TLS + auth.
The FastAPI service (api/app.py + api/routes.py) exposes two routes. Interactive docs at /docs (Swagger) and /redoc.
curl http://localhost:8000/health
# {"status":"ok","pipeline_loaded":true,"artifacts_dir":"artifacts"}curl -X POST http://localhost:8000/score \
-H 'Content-Type: application/json' \
-d @- <<'JSON'
{
"transaction": {
"txn_id": "t-1", "timestamp": "2025-06-01T10:00:00",
"customer_id": "c1", "card_id": "card1", "merchant_id": "m1",
"merchant_category": "5942", "merchant_country": "XX",
"customer_country": "US", "amount": 9999.0, "currency": "USD",
"device_id": "d1", "ip_address": "203.0.113.42", "channel": "online"
},
"history": []
}
JSONResponse:
{
"decision": "block",
"risk_score": 1.0,
"hard_block": true,
"stages": [
{ "name": "rules", "score": 1.0, "confidence": 1.0 }
],
"reasons": ["sanctioned country XX"]
}uv run fraud-serve --port 8000
# Requires that you've already run `uv run fraud-train` so artifacts/
# contains the trained models.
# Or, run uvicorn directly:
# uv run uvicorn fraud_detection.api.app:app --port 8000fraud-detection/
├── README.md
├── LICENSE
├── CODE_OF_CONDUCT.md # Contributor Covenant 3.0
├── CONTRIBUTING.md # Dev setup, workflow, PR checklist
├── SECURITY.md # Disclosure process + threat model
├── pyproject.toml # Hatchling-built package + uv-managed deps
├── uv.lock # Pinned, reproducible resolution
├── .python-version # Pinned Python 3.13
├── .env.example # All FRAUD_* env vars + defaults
├── Dockerfile # Containerises the service
├── .dockerignore
├── .gitignore
├── docs/
│ └── guide.md # Detailed implementation walkthrough
├── src/fraud_detection/
│ ├── __init__.py # Lazy-export public API
│ ├── config.py # pydantic-settings, env-driven
│ ├── logging_config.py # JSON / text structured logging
│ ├── schema.py # Transaction & StageResult dataclasses
│ ├── rules.py # Stage 2 — deterministic rules engine
│ ├── features.py # Numeric + categorical feature engineering
│ ├── graph.py # Stage 5 — NetworkX entity graph
│ ├── aggregator.py # Stages 6 & 7 — fusion + decision
│ ├── pipeline.py # Orchestrator that wires it all up
│ ├── models/ # ML models
│ │ ├── __init__.py
│ │ ├── boosting.py # Stages 3 & 4b — fast + enriched GBTs
│ │ └── encoder.py # Stage 4a — PyTorch autoencoder + EmbeddingService
│ ├── api/ # HTTP service
│ │ ├── __init__.py
│ │ ├── app.py # create_app() FastAPI factory + lifespan
│ │ ├── routes.py # GET /health, POST /score
│ │ ├── schemas.py # Pydantic request/response models
│ │ └── dependencies.py # build_pipeline() + DI provider
│ └── cli/ # Console entry points
│ ├── __init__.py
│ ├── generate_data.py # `fraud-generate`
│ ├── train.py # `fraud-train`
│ ├── demo.py # `fraud-demo`
│ └── serve.py # `fraud-serve` (uvicorn launcher)
├── tests/ # pytest — fixtures + unit + e2e (pipeline-stubbed)
│ ├── __init__.py
│ ├── conftest.py
│ ├── test_rules.py
│ ├── test_features.py
│ ├── test_aggregator.py
│ ├── test_graph.py
│ └── test_pipeline.py
└── artifacts/ # gitignored — generated data + trained models
After uv sync the project is installed in editable mode and four commands appear on PATH (inside uv run):
| Command | What it does |
|---|---|
fraud-generate |
Create synthetic transactions with injected fraud patterns |
fraud-train |
Train encoders + autoencoder + both GBT models |
fraud-demo |
Score four illustrative scenarios end-to-end |
fraud-serve |
Launch the FastAPI service (uvicorn wrapper) |
The deep technical walkthrough lives in docs/guide.md — every stage, every algorithm choice, the math behind aggregation, the production gaps. ~700 lines, read top-to-bottom or jump in via its table of contents.
| What you want | Where to look |
|---|---|
| 60-second overview | this README |
| What each stage does | ✨ Features above |
| Run it locally / in Docker | 🚀 Quick Start / 🐳 Docker |
| Why noisy-OR? Why class-weight? Why an autoencoder? | 📚 Implementation Guide |
| API request / response shapes | 🌐 HTTP API |
A Transaction is a plain dataclass (schema.py):
Transaction(
txn_id="...", timestamp=..., customer_id="...", card_id="...",
merchant_id="...", merchant_category="5812", merchant_country="US",
customer_country="US", amount=42.10, currency="USD",
device_id="...", ip_address="...", channel="online",
)Adapt to whatever your message bus delivers — Kafka, Pub/Sub, an HTTP webhook from the acquirer, etc.
2️⃣ Rules engine (rules.py)
Two roles in one layer:
- Hard blocks that must be auditable — sanctions, blocked merchants, regulatory floors. These bypass the rest of the pipeline.
- Cheap signals — velocity, geo jumps, amount thresholds — that flow into the aggregator alongside model scores.
The rules shipped here are intentionally illustrative — most production systems have dozens to hundreds, often expressed as a DSL or as Drools / OpenL Tablets.
3️⃣ Fast GBT (models/boosting.py)
The feature set in features.py:
- Transaction-level:
amount,log_amount,hour,dow,is_night, … - Velocity windows: txn counts at 5 m / 1 h / 24 h / 7 d
- Behavioural deltas: secs since last txn, amount-vs-30-d-average, first-time-merchant flag, country mismatch
- Categorical: merchant category / country / channel / currency (label-encoded with a stable vocab)
Why not just one big model? Most transactions are obvious — boring groceries, recurring subscriptions, a thousand identical coffee purchases. A shallow model nails those at < 1 ms. Only the grey-zone transactions deserve the heavier path.
4️⃣ Encoder + enriched GBT (models/encoder.py)
The autoencoder learns a smooth representation of transaction behaviour. The 8-d bottleneck captures "what kind of transaction is this, behaviourally?" — clusters of similar fraud rings whose features look unremarkable individually.
In a real system you might extend this with:
- Sequence embeddings of the last N transactions of this card.
- Merchant embeddings (a separate model trained on merchant → category).
- Identity-graph node embeddings (see stage 5).
5️⃣ Graph analysis (graph.py)
The graph is read-only at scoring time — you only add the txn after the final decision, so the candidate doesn't bias its own score.
g = EntityGraph()
g.build_from(historical_transactions)
signal = g.score(candidate_txn)
# GraphSignal(cluster_size=5, fraud_neighbors=4, ...)6️⃣ + 7️⃣ Aggregation and decision (aggregator.py)
Tune the thresholds per channel / amount band / customer segment in production. The defaults here are sensible starting points — real values come from a precision/recall sweep against a business-defined cost matrix.
from fraud_detection import FraudPipeline, Transaction
# ... build pipeline as in fraud_detection/cli/demo.py ...
result = pipeline.score(txn, history=customer_history)
print(result.decision) # Decision.APPROVE | CHALLENGE | BLOCK
print(result.risk_score) # 0.0 .. 1.0
for s in result.stages: # which stages actually ran
print(s.name, s.score, s.confidence)
for r in result.reasons: # top human-readable reasons
print(r)| Extension | How |
|---|---|
| More rules | Drop additions into RulesEngine.evaluate. Keep them cheap — that's the whole point of stage 2. |
| Different stage-3 model | Anything with predict_proba works. XGBoost / LightGBM / a logistic regression / TabNet are all drop-in. |
| Better embeddings | Replace Autoencoder with a transformer over the customer's last 50 transactions, or a pretrained merchant model. |
| Real graph database | Swap EntityGraph for a Neo4j / AlloyDB Omni / TigerGraph client — the score() interface is the contract. |
| Online / streaming | The pipeline is pure-function given (txn, history). Plug it behind Kafka Streams, Beam, or any RPC framework. |
- 🎲 Synthetic data isn't real card flows. Distributions, seasonality and the fraud:legit ratio are all toy values. Real fraud is closer to 0.1–0.5%, not the 4–5% here.
- 🕰️ Random train/test split. Production should be temporal (out-of-time holdout) and out-of-customer (no leakage from a customer appearing in both splits).
- 📉 No drift monitoring, calibration, or feedback loop from analyst review — all of which matter as much as the model architecture.
- 🎚️ Thresholds are hand-tuned. In production they come from a precision/recall sweep against a business-defined cost matrix.
Contributions are welcome! See the Contributing Guide for setup, the development workflow, and the PR checklist.
This project follows the Contributor Covenant 3.0 Code of Conduct.
If you discover a security vulnerability, please do not open a public issue — see the Security Policy for the private disclosure process and the project's security model.
This project is licensed under the MIT License — feel free to use it for both personal and commercial purposes. See the LICENSE file for details.
Built with ❤️ by Sumit Sahoo