🛡️ Wallet Risk Scorer ML

An advanced Machine Learning pipeline for detecting malicious cryptocurrency wallets.

📌 Overview

The Wallet Risk Scorer is a data-driven security tool designed to classify blockchain addresses as "Safe" or "Scam" based on behavioral analysis. By ingesting transaction history and derived metrics, the system trains a high-performance XGBoost Classifier to predict the likelihood of malicious activity.

This project moves beyond simple blocklists by analyzing behavioral features—such as transaction frequency, activity duration, and token transfer patterns—to flag suspicious wallets that may not yet be reported.

🚀 Key Features

🤖 Advanced Gradient Boosting: Utilizes XGBoost (Extreme Gradient Boosting) for superior performance on tabular risk data.
🎯 Automated Optimization: Implements RandomizedSearchCV to automatically tune hyperparameters (n_estimators, max_depth, learning_rate, etc.) for the best possible accuracy.
📉 Robust Validation: Uses Stratified K-Fold Cross-Validation (K=5) to ensure the model generalizes well to unseen data and avoids overfitting.
📊 Detailed Analytics: Generates comprehensive Classification Reports and Confusion Matrices to evaluate precision, recall, and F1-scores.
🧠 Feature Engineering: Derives key behavioral signals like transaction_frequency (transactions per active day) to enhance model discriminability.

🛠️ Tech Stack

Language: Python 3.12+
Machine Learning: XGBoost, Scikit-Learn
Data Manipulation: Pandas, NumPy
Visualization: Matplotlib
Data Source: Etherscan / Moralis (via efficient CSV datasets)

📂 Project Structure

wallet-risk-scorer-ML/
├── data/                  # Source CSV datasets (Safe vs Scam wallets)
├── models/                # Serialized trained models (.joblib)
├── src/
│   ├── risk_scorer/
│   │   ├── data_collection/ # Scripts to fetch transactions & labels
│   │   ├── main.py          # 🚀 MASTER PIPELINE: Preprocessing -> Tuning -> Training -> Evaluation
│   │   └── config.py        # Configuration & Path definitions
│   └── utils/               # Helper functions for data fetching
├── pyproject.toml         # Project dependencies & configuration
└── README.md              # Documentation

⚡ Getting Started

1. Prerequisites

Ensure you have Python 3.9+ installed on your machine.

2. Installation

Clone the repository and install the dependencies.

git clone https://github.com/yourusername/wallet-risk-scorer-ML.git
cd wallet-risk-scorer-ML

# It is recommended to use a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install pandas scikit-learn xgboost matplotlib joblib python-dotenv requests

3. Usage

To run the full Training & Evaluation Pipeline:

python -m src.risk_scorer.main

What happens when you run this?

Data Ingestion: Loads scam and safe wallet datasets.
Preprocessing: Cleans data, handles missing values, and calculates transaction_frequency.
Hyperparameter Tuning: Runs a Randomized Search to find the best XGBoost parameters.
Training: Trains the model on the full dataset using the best parameters.
Evaluation: Performs 5-Fold Cross-Validation and prints detailed accuracy metrics.
Serialization: Saves the optimized model to models/xgboost_optimized_v5.joblib.

📊 Methodology

The core logic resides in src/risk_scorer/main.py. The pipeline follows these steps:

Labeling: Assigns 1 to scam datasets and 0 to safe datasets.
Merging: Combines base wallet data with token transfer data.

Hyperparameter Search:

param_dist = {
    'n_estimators': [100, 300, 500, 700],
    'learning_rate': [0.01, 0.05, 0.1, 0.2],
    'max_depth': [3, 4, 5, 6, 8],
    # ... and more
}

Model Serialization: The final high-performing model is saved using joblib for future inference integration.

🔮 Future Roadmap

Real-time API: Expose the model via a FastAPI/Flask endpoint.
Live Inference: Script to fetch data for a new address and predict immediately.
Deep Learning: Explore LSTM/RNNs for sequential transaction analysis.
Explainability: Integrate SHAP (SHapley Additive exPlanations) to explain individual risk scores.

Disclaimer: This tool is for educational and research purposes. Cryptocurrency markets are volatile and high-risk. Always do your own research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ Wallet Risk Scorer ML

📌 Overview

🚀 Key Features

🛠️ Tech Stack

📂 Project Structure

⚡ Getting Started

1. Prerequisites

2. Installation

3. Usage

📊 Methodology

🔮 Future Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
models		models
src		src
.gitignore		.gitignore
README.md		README.md
debug_nan.py		debug_nan.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

🛡️ Wallet Risk Scorer ML

📌 Overview

🚀 Key Features

🛠️ Tech Stack

📂 Project Structure

⚡ Getting Started

1. Prerequisites

2. Installation

3. Usage

📊 Methodology

🔮 Future Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages