This project is an SMS spam classifier built on top of a fine-tuned GPT-2-based language model. It leverages modern tooling in Python, such as pydantic, loguru, ruff, and make, to deliver a clean, maintainable, and scalable machine learning pipeline.
- 🧠 Fine-tuned GPT-2 model for binary SMS classification (spam vs ham)
- 🔄 Balanced dataset preprocessing
- 🛡️ Pydantic for robust data validation
- 📊 Dataset metrics and confusion matrix visualization
- 📦 Tooling includes
ruff,loguru, andmakefor linting, logging, and workflow automation
- Python 3.12+
- Transformers (HuggingFace)
- Datasets
- Pydantic
- Loguru
- Ruff
- Make