This project aims to predict whether a driver will churn (leave) Ola using historical, demographic, and performance-related data. The goal is to help the organization reduce churn by identifying at-risk drivers and enabling proactive retention strategies.
Recruiting and retaining drivers is a major challenge for ride-sharing companies like Ola. High driver churn affects:
- Operational stability
- Customer experience
- Driver acquisition cost (which is higher than retention)
By using machine learning, this project predicts driver churn probability based on:
- Demographics (e.g., age, gender, education)
- Performance metrics (e.g., ratings, business value)
- Tenure (e.g., date of joining, last working date)
- Behavioral patterns (e.g., income or rating changes)
- Data Cleaning & Imputation: Missing values handled using KNN imputer.
- Feature Engineering:
- Change in rating and income over time
- Flags for improvement in grade/rating
- Time served in days
- Models Built:
- Decision Tree
- Random Forest (GridSearch tuned)
- XGBoost (best performing)
- LightGBM (fastest, good baseline)
- Class Imbalance: Addressed using SMOTE
- Model Interpretation: SHAP values used to explain predictions at an individual level
- Interactive App: Streamlit frontend for business users to test custom driver profiles
The dataset contains ~19,000 driver records with the following fields (subset):
| Column | Description |
|---|---|
| Age | Driver's age |
| Gender | 0 = Female, 1 = Male |
| Income | Total income earned |
| Joining Designation | Initial designation at the time of joining |
| Quarterly Rating | Ola's internal rating of driver |
| Total Business Value | Revenue generated by the driver |
| Last Grade | Last performance grade |
| Churn | Target variable (1 = Churn, 0 = Stay) |
The app highlights top features driving churn prediction with direction:
↑ Towards Churn→ pushing model to predict churn↓ Away from Churn→ reducing churn risk
Key drivers often include:
- Last Rating
- Change in Rating
- Total Business Value
- Income Increase
- Start Streamlit
📁 Project Structure Churn-Prediction/ ├── ola_churn.py # EDA + model building script ├── app.py # Streamlit frontend ├── xgb_model.pkl # Trained XGBoost model ├── scaler.pkl # Scaler used in training ├── xgb_explainer.pkl # SHAP explainer ├── ola.csv # Dataset └── README.md # This file
| Metric | Score |
|---|---|
| Accuracy | ~87% |
| F1 Score | ~85% |
| Recall | ~86% |
| Precision | ~83% |
