Decision Tree Algorithm Project

Overview

This repository contains the implementation of a Decision Tree algorithm for classification tasks, applied to two datasets:

Fraud Detection: Classifies individuals as "Risky" or "Good" based on taxable income and other attributes.
Sales Analysis: Identifies attributes contributing to high sales for a cloth manufacturing company.

Datasets

Fraud Detection Dataset:
- Attributes: Undergrad, Marital.Status, Taxable.Income, Work Experience, Urban.
- Target: "Risky" (Taxable.Income ≤ 30,000) or "Good".
Sales Dataset:
- Attributes: Sales, CompPrice, Income, Advertising, Population, Price, ShelveLoc, Age, Education, Urban, US.
- Target: Categorical variable derived from Sales.

Project Structure

📂 Decision-Tree-Algorithm
├── 📁 data                # Dataset files
│   ├── fraud_data.csv
│   ├── sales_data.csv
├── 📁 notebooks           # Jupyter notebooks for data exploration and model building
│   ├── fraud_detection.ipynb
│   ├── sales_analysis.ipynb
├── 📁 scripts             # Python scripts for modular code
│   ├── preprocess.py      # Data preprocessing functions
│   ├── train_model.py     # Model training and evaluation
│   ├── visualize_tree.py  # Decision tree visualization
├── 📁 results             # Outputs such as visualized trees and metrics
│   ├── fraud_tree.png
│   ├── sales_tree.png
├── README.md              # Project overview
├── requirements.txt       # Python dependencies
├── LICENSE                # License details
└── .gitignore             # Files to ignore in Git

Key Features

Fraud Detection Model:
- Accuracy on Training Set: 1.0
- Cross-Validation Score: 0.998
- Visualized decision tree for interpretability.
Sales Analysis Model:
- Identifies critical factors driving high sales.
- Feature Importance:
  - Advertising: 16.01%
  - Population: 16.72%
  - ShelveLoc: 12.25%

Setup

Clone this repository:

git clone https://github.com/R-Mahesh45/Decision-Tree-Algorithm.git
cd Decision-Tree-Algorithm

Install dependencies:
```
pip install -r requirements.txt
```

Run the scripts:

For fraud detection:

python scripts/train_model.py --dataset data/fraud_data.csv

For sales analysis:

python scripts/train_model.py --dataset data/sales_data.csv

Results

Fraud Detection Tree:
(https://github.com/user-attachments/assets/86612b54-c1dc-463f-8a1a-d993e31db6df)
Sales Analysis Tree:
(https://github.com/user-attachments/assets/deaa34a1-cb4a-4cfd-a54e-d041a26abec7)

Visualization

Visualize decision trees:

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(15, 10))
plot_tree(model, filled=True, feature_names=X.columns, class_names=["Risky", "Good"])
plt.show()

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Assign_14_Decision_fraud.ipynb		Assign_14_Decision_fraud.ipynb
Assign_14_Decisiontree_COmpany.ipynb		Assign_14_Decisiontree_COmpany.ipynb
Company_Data.csv		Company_Data.csv
Fraud_check.csv		Fraud_check.csv
Problem_statement(Fraudcheck).txt		Problem_statement(Fraudcheck).txt
README.md		README.md
fraud.png		fraud.png
problem statement(company data).txt		problem statement(company data).txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decision Tree Algorithm Project

Overview

Datasets

Project Structure

Key Features

Setup

Results

Visualization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Decision Tree Algorithm Project

Overview

Datasets

Project Structure

Key Features

Setup

Results

Visualization

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages