Spam Email Detection with Gaussian Naive Bayes

This project implements a fully custom Gaussian Naive Bayes classifier to detect spam emails using the UCI Spambase dataset. It incorporates advanced preprocessing, feature selection, parameter tuning, and ensemble learning to achieve over 91% accuracy on the test set.

Project Highlights

Custom Naive Bayes Implementation
- Built from scratch using only NumPy.
- Stable handling of zero variance using a minimum standard deviation.
- Inference based on log-probabilities to prevent underflow.
Advanced Preprocessing
- Log1p transformation to reduce skew.
- Robust IQR-based scaling.
- Outlier clipping to ±3 IQR.
Feature Selection
- Uses a signal-to-noise ratio to rank features by discriminative power.
- Enables top-k feature selection for optimized performance.
Hyperparameter Optimization
- Grid search over combinations of min_std, prior_smoothing, and top_k features.
- Evaluation based on validation accuracy.
Ensemble Learning
- Combines 7 Naive Bayes models trained on different feature subsets.
- Final predictions are made by majority voting.
Evaluation
- Achieves:
  - Accuracy: 0.9124
  - Precision: 0.8628
  - Recall: 0.9246
  - F1 Score: 0.8926
- Includes confusion matrix visualizations.

File Structure

How to Run

Clone this repository.
Open notebook/SpamEmailDetection.ipynb in Google Colab or Jupyter.
Ensure the following libraries are installed:
- numpy
- pandas
- scikit-learn
- matplotlib
Run all cells to train and evaluate the models.

Dataset

UCI Spambase Dataset:
https://archive.ics.uci.edu/dataset/94/spambase

Course Information

This project was developed for CS 445/545: Machine Learning (Spring 2025) at Portland State University.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
SpamEmailDetection.ipynb		SpamEmailDetection.ipynb
Spam_Detection_Model.pdf		Spam_Detection_Model.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Email Detection with Gaussian Naive Bayes

Project Highlights

File Structure

How to Run

Dataset

Course Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spam Email Detection with Gaussian Naive Bayes

Project Highlights

File Structure

How to Run

Dataset

Course Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages