Tax-Deductible Transaction Classifier

This project classifies bank transactions as either tax-deductible or non-deductible using a hybrid system that combines rule-based logic and a machine learning model.

Live Demo: https://taxclassifier.streamlit.app/

Project Structure

├── app.py                # Streamlit web app interface
├── classifier.py         # ML prediction logic
├── data_loader.py        # Preprocessing and CSV loading
├── rules.py              # Rule matching function
├── rules_config.py       # Regex-based tax rules
├── train_model.py        # Training pipeline for ML model
├── requirements.txt      # Python dependencies

How It Works

Hybrid Classification Logic

Rule-Based Classifier: Uses regex patterns (e.g., business travel, meals, equipment) to assign labels and explanations.
ML Classifier: Falls back to a trained logistic regression model when no rule matches, using TF-IDF on text and one-hot encoding on merchant name.

The model outputs a clear explanation for every prediction.

Input Format

The input CSV must contain:

date – e.g., 2024-05-10
amount – e.g., 125.75
merchant – e.g., Delta Airlines
description – e.g., Flight to NYC for business conference

Output Format

The output is available in both CSV and JSON formats. Each record contains:

date
merchant
description
deductible – true or false
reason – explanation (e.g., "Business travel", or "ML (p=0.84)")

How to Run

1. Install dependencies

pip install -r requirements.txt

2. (Optional) Retrain model

python train_model.py --input data/sample_transactions-2.csv --output models/tax_deductible_clf.joblib

3. Launch web app

streamlit run app.py

You can upload your own transaction CSV or load the sample.

Features

✅ Explainable rule-based deductions
✅ ML fallback with threshold tuning
✅ Streamlit UI with file upload and downloads
✅ JSON + CSV export support
✅ Modular and extensible codebase

Assumptions

Transactions that don’t match any rule default to ML classification.
The threshold for ML confidence is set to 0.5 by default (user-adjustable).
Rule matching is prioritized by order — first match wins.

Sample JSON Output

[
  {
    "date": "2025-07-25",
    "merchant": "Apple",
    "description": "MacBook purchase for work",
    "deductible": true,
    "reason": "Business equipment purchase"
  }
]

Author

Arnav Gupta
AI/ML Internship Candidate
arnavgupta.info

source .venv/bin/activate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tax-Deductible Transaction Classifier

Project Structure

How It Works

Hybrid Classification Logic

Input Format

Output Format

How to Run

1. Install dependencies

2. (Optional) Retrain model

3. Launch web app

Features

Assumptions

Sample JSON Output

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.devcontainer		.devcontainer
.venv		.venv
__pycache__		__pycache__
data		data
models		models
.gitattributes		.gitattributes
README.md		README.md
app.py		app.py
classifier.py		classifier.py
data_loader.py		data_loader.py
requirements.txt		requirements.txt
rules.py		rules.py
rules_config.py		rules_config.py
train_model.py		train_model.py

Folders and files

Latest commit

History

Repository files navigation

Tax-Deductible Transaction Classifier

Project Structure

How It Works

Hybrid Classification Logic

Input Format

Output Format

How to Run

1. Install dependencies

2. (Optional) Retrain model

3. Launch web app

Features

Assumptions

Sample JSON Output

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages