Aim: Identify which customers will make a specific transaction in the future
This project referred to the 2nd most popular Kaggle competition,
where i scored at the top 8% Global rank.
In this imbalanced classification problem, data are anonymized.
I performed a deep EDA and data manipulation including preprocessing & feature engineering
to understan the underlying data pattenrs, building predictive models such as lightGMB and logisting regression.
To further familiarized with Shiny app, all visualizations additionally created using Rshiny.
- ๐ Dataset - Data description & source
- ๐งน Data Processing - Cleaning and feature engineering
- ๐ Reproducibility - Reproducibility steps
- ๐ฆ Requirements - Install dependencies
Dataset is anonymized containing 200 numeric feature variables, the binary โtargetโ column, and a string โID_codeโ column.
The task is to predict the value of โtargetโ column in the test set.
- Download source provided for this competition has the same structure as the real data where Santander has available to solve this problem.
Submissions are evaluated on the Area Under the Curve (AUC) between the predicted probability and the observed target.
- Clone the repo!
Run the following commands in terminal:
git clone 'https://github.com/Papagiannopoulos/Santander_Customer_Transaction_Prediction.git'
cd 'Santander_Customer_Transaction_Prediction' - Download the data & add them in your cloned repo
- Run the requirements.R
- Before running the Shiny app.R, run the requirements script to install all necessary libraries
- R version: 4.4.1 (2024-06-14 ucrt)