💳 Credit Risk Prediction System

A Machine Learning project that predicts whether a loan applicant is Low Risk or High Risk using structured financial and demographic data.

🚀 Live Streamlit App: https://credit-risk-prediction-system-1.streamlit.app/

📊 At-a-Glance Impact

Metric	Value
Dataset Size	1,000+ Credit Applications
Models Trained	Decision Tree, Random Forest, Extra Trees, XGBoost
Best Accuracy	85%+ (Balanced Classes)
Target Variable	Credit Risk (Good / Bad)
Business Goal	Reduce Default Risk & Improve Loan Decisions

🚀 Why This Project Matters

Banks and financial institutions lose billions of dollars annually due to loan defaults.

🔴 Even a 1% improvement in credit risk prediction can save millions in bad loans.

This project simulates a real banking credit approval system, applying machine learning to:

Reduce default risk
Improve approval accuracy
Enable data‑driven lending decisions

📊 Dataset Overview

Customers: ~1,000 loan applicants
Features: Demographic + Financial attributes
Target: Credit Risk (Good / Bad)

Key Features Used

Feature	Description
Age	Applicant age
Sex	Gender
Job	Employment category
Housing	Own / Rent / Free
Saving Accounts	Savings status
Checking Account	Checking balance
Credit Amount	Loan amount requested
Duration	Loan duration (months)
Purpose	Loan purpose

🧠 Machine Learning Models

Multiple models were trained and evaluated:

Model	Strength
Decision Tree	Interpretability
Random Forest	Stability & performance
Extra Trees	Reduced variance
XGBoost	High predictive power

⚡ XGBoost delivered the best overall performance.

📈 Results & Impact

Metric	Score
Accuracy	~80%+
Recall (High Risk)	Improved via class balancing
Precision	Optimized using GridSearch

Business Interpretation

Correctly flags high‑risk borrowers
Minimizes false approvals
Improves lender profitability

🔧 Feature Engineering

Label Encoding for categorical variables
Target encoding for Risk variable
Class imbalance handled using class_weight="balanced"
Hyperparameter tuning with GridSearchCV

📦 Project Structure

credit-risk-prediction/
│
├── data/
│   └── german_credit.csv
│
├── models/
│   ├── decision_tree.pkl
│   ├── random_forest.pkl
│   └── label_encoders/
│
├── notebooks/
│   └── credit_risk_analysis.ipynb
│
├── src/
│   ├── preprocessing.py
│   ├── train.py
│   └── evaluate.py
│
├── README.md
└── requirements.txt

🛠 Tech Stack

Python 🐍
Pandas / NumPy
Scikit‑Learn
XGBoost
Seaborn & Matplotlib
Joblib

⚙️ How to Run

pip install -r requirements.txt
jupyter notebook

or train directly:

python src/train.py

📌 Key Learnings

Real‑world data is messy & imbalanced
Model accuracy alone is not enough — recall matters
Explainability is crucial in finance ML

🎯 Who Should Look at This?

✔ Recruiters (Data Science / ML / FinTech) ✔ Banks & NBFC analysts ✔ ML engineers learning applied finance ✔ Students preparing for placements

🌟 Future Improvements

SHAP explainability
Model deployment with Streamlit
Real‑time credit scoring API
ROC‑AUC optimization

🙌 Author

Prathmesh Bunde CSE | FinTech | Machine Learning

📌 If this project helped you, don’t forget to ⭐ the repo!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💳 Credit Risk Prediction System

📊 At-a-Glance Impact

🚀 Why This Project Matters

📊 Dataset Overview

Key Features Used

🧠 Machine Learning Models

📈 Results & Impact

Business Interpretation

🔧 Feature Engineering

📦 Project Structure

🛠 Tech Stack

⚙️ How to Run

📌 Key Learnings

🎯 Who Should Look at This?

🌟 Future Improvements

🙌 Author

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

💳 Credit Risk Prediction System

📊 At-a-Glance Impact

🚀 Why This Project Matters

📊 Dataset Overview

Key Features Used

🧠 Machine Learning Models

📈 Results & Impact

Business Interpretation

🔧 Feature Engineering

📦 Project Structure

🛠 Tech Stack

⚙️ How to Run

📌 Key Learnings

🎯 Who Should Look at This?

🌟 Future Improvements

🙌 Author