A machine learning project to classify emails as spam or ham using text preprocessing and classification models (Naïve Bayes & SVM). Trained on a subset of the Enron email dataset.
- Dataset: 50,000 emails
- Email text preprocessing (cleaning, tokenization, lemmatization)
- TF-IDF vectorization (Top 500 features)
- Two models: Naïve Bayes & Support Vector Machine
- Model accuracies:
- Naïve Bayes: 95.4%
- SVM: 97.3%
- Python
- Pandas, NLTK, Scikit-learn
- Matplotlib, Seaborn