Car Insurance Claim Prediction

This project explores two car-insurance datasets and builds models to predict whether a policyholder will file a claim. The work is organized in two Jupyter notebooks:

Portfolio_Project.ipynb — Vehicle/specs dataset (highly imbalanced target)
Car_Claims_Analysis.ipynb — Driver-profile dataset (balanced target)

Datasets

Vehicle/specs dataset (imbalanced)

Features: vehicle specs, equipment flags, regions, etc.
Target: claim_status (≈6.4% positives)

Driver-profile dataset (balanced)

Features: demographics, driving history, credit and mileage
Target: OUTCOME (~31% positives)

Approach

Keep EDA visuals on raw data.
Split train/test before any target-aware transforms.
Encoding
- High-cardinality location → target encoding with cross‑fitting
  - region_code in the first dataset
  - POSTAL_CODE in the second dataset
- Remaining categoricals → One‑Hot Encoding.
Missing values
- In pipelines only (not pre-filled in data)
- Numeric: median by default; for bell‑shaped variables like CREDIT_SCORE we use mean imputation
- Categorical: most_frequent
Threshold selection via the Precision–Recall curve

Models and key results

Vehicle/specs (Portfolio_Project.ipynb)

Baselines
- Logistic Regression (class_weight='balanced'): ROC‑AUC ≈ 0.62, PR‑AUC ≈ 0.09
- Random Forest (tuned, class_weight='balanced'): ROC‑AUC ≈ 0.66, PR‑AUC ≈ 0.11
Notes
- Strong class imbalance makes precision low at useful recall.
- After correlation analysis, near‑deterministic equipment flags were dropped; region_code was target‑encoded.

Driver profile (Car_Claims_Analysis.ipynb)

Logistic Regression (balanced data): ROC‑AUC 0.906, PR‑AUC 0.812, Accuracy 0.842
- Confusion (0.5): TN=1224, FP=149, FN=168, TP=459
Random Forest: ROC‑AUC 0.897, PR‑AUC 0.787, Accuracy 0.829
- Confusion (0.5): TN=1193, FP=180, FN=162, TP=465
Takeaway
- LR has better ranking (PR/ROC‑AUC), higher precision, fewer false positives.
- RF finds slightly more true claims (465 vs 459) with more false alarms.

Find the project poster and similarly descriptive materials for this project here!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
Car_Claims_Analysis.ipynb		Car_Claims_Analysis.ipynb
Car_Insurance_Claim.csv		Car_Insurance_Claim.csv
Portfolio_Project.ipynb		Portfolio_Project.ipynb
README.md		README.md
insurance_claims.csv		insurance_claims.csv
rf_best_model_full.joblib		rf_best_model_full.joblib
rf_best_threshold.json		rf_best_threshold.json
rf_feature_columns.json		rf_feature_columns.json
rf_training_params.json		rf_training_params.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Car Insurance Claim Prediction

Datasets

Approach

Models and key results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Car Insurance Claim Prediction

Datasets

Approach

Models and key results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages