KNN Classification on Breast Cancer Dataset

🎯 Problem Statement and Goal of Project

This project explores a foundational classification task using the K-Nearest Neighbors (KNN) algorithm on the widely used Breast Cancer dataset from scikit-learn. The primary goal is to understand the behavior of KNN in distinguishing between malignant and benign tumors based on numerical features derived from digitized images of breast mass tissue.

🛠 Solution Approach

The notebook follows a clean ML workflow:

Data Loading: Load breast cancer data via sklearn.datasets.load_breast_cancer().
Exploratory Data Analysis:
- View data shape, features, and class distributions.
- Use seaborn for basic visualization.
Train-Test Split:
- Split dataset into training and test sets (80/20).
Modeling with KNN:
- Apply KNeighborsClassifier from sklearn.neighbors.
- Test different k values (number of neighbors).
Evaluation:
- Accuracy evaluation.
- Confusion matrix plotting.
- Error rate plot vs. k.

🧰 Technologies & Libraries

Python 3
numpy, pandas – numerical computing and data manipulation
seaborn, matplotlib – visualization
scikit-learn – datasets, model training, evaluation

📁 Dataset Description

Source: sklearn.datasets.load_breast_cancer()
Features: 30 real-valued features (e.g., mean radius, texture, smoothness)
Target Classes:
- 0: malignant
- 1: benign
Samples: 569
Goal: Predict whether a tumor is malignant or benign

⚙️ Installation & Execution Guide

Download or clone the repository.

Install required libraries:

pip install numpy pandas matplotlib seaborn scikit-learn

Run the notebook:
```
jupyter notebook knn.ipynb
```

📊 Key Results / Performance

Train/Test Split: 80/20
KNN Accuracy: Varies with k, best value selected via error rate plot
Visualization:
- Target distribution via seaborn.countplot
- Confusion matrix for classification performance
- Error vs. k plot to optimize hyperparameters

📸 Sample Outputs

📈 Error rate vs. k visualization
✅ Confusion matrix
📊 Class distribution plots

📚 Additional Learnings / Reflections

Demonstrates clear understanding of:
- The importance of choosing the right k
- Effects of class imbalance
- Train-test split strategy
A practical and well-scaffolded implementation of KNN in binary classification.

👤 Author

mehran Asgari

Email: imehranasgari@gmail.com GitHub: https://github.com/imehranasgari

📄 License

This project is licensed under the Apache 2.0 License – see the LICENSE file for details.

💡 Some interactive outputs (e.g., plots, widgets) may not display correctly on GitHub. If so, please view this notebook via nbviewer.org for full rendering.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
knn.ipynb		knn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KNN Classification on Breast Cancer Dataset

🎯 Problem Statement and Goal of Project

🛠 Solution Approach

🧰 Technologies & Libraries

📁 Dataset Description

⚙️ Installation & Execution Guide

📊 Key Results / Performance

📸 Sample Outputs

📚 Additional Learnings / Reflections

👤 Author

mehran Asgari

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KNN Classification on Breast Cancer Dataset

🎯 Problem Statement and Goal of Project

🛠 Solution Approach

🧰 Technologies & Libraries

📁 Dataset Description

⚙️ Installation & Execution Guide

📊 Key Results / Performance

📸 Sample Outputs

📚 Additional Learnings / Reflections

👤 Author

mehran Asgari

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages