A data engineering and machine learning project that predicts loan approval decisions based on applicant financial and personal information — built with Python, Scikit-learn, and Flask/Streamlit.
This project covers the complete data pipeline — from raw JSON ingestion and EDA to model training, serialization, and a deployed prediction app. It demonstrates real-world ETL practices, feature engineering, and ML model deployment.
Data-Engineering-Project-2/
│
├── EDA_Train_pickle.py # Exploratory data analysis + model training + pickle export
├── app.py # Web app for loan approval prediction
├── loan_approval_model.pkl # Trained ML model (serialized)
│
├── applicant_info.json # Applicant personal data
├── financial_info.json # Applicant financial data
├── loan_info.json # Loan request details
│
├── test_load_data.py # Unit tests for data loading
├── requirements.txt # Project dependencies
└── README.md
JSON Data Files → EDA & Cleaning → Feature Engineering → Model Training → Pickle Export → Web App
- Ingest — Load applicant, financial, and loan data from JSON files
- EDA — Explore distributions, handle missing values, encode categories
- Train — Train a classification model (Scikit-learn) to predict approval
- Serialize — Export trained model as
loan_approval_model.pkl - Serve — Load model in
app.pyand predict on new applicant data
| Layer | Tools |
|---|---|
| Language | Python |
| Data handling | Pandas, NumPy |
| ML Model | Scikit-learn |
| Serialization | Pickle |
| App | Flask / Streamlit |
| Testing | Pytest |
| Data format | JSON |
1. Clone the repo
git clone https://github.com/SahilUjgare/Data-Engineering-Project-2.git
cd Data-Engineering-Project-22. Install dependencies
pip install -r requirements.txt3. Train the model
python EDA_Train_pickle.py4. Run the app
python app.py5. Run tests
python test_load_data.py- Task — Binary classification (Loan Approved / Rejected)
- Input features — Applicant info, financial history, loan amount, tenure
- Output — Approval prediction with confidence score
- Model file —
loan_approval_model.pkl
| File | Description |
|---|---|
applicant_info.json |
Name, age, employment, credit history |
financial_info.json |
Income, expenses, existing loans |
loan_info.json |
Requested amount, tenure, purpose |
This project is open source and available under the MIT License.