Kernels & Regularization

This repository contains the implementation for Machine Learning Homework, focusing on advanced machine learning techniques including Kernel Methods (Support Vector Classifier, Kernel K-Means) and Regularization Techniques (Ridge & Lasso Regression). The project involves implementing algorithms from scratch (using Python) while utilizing standard libraries for data handling and evaluation.

This project explores the power of kernel methods for non-linearly separable data and the impact of regularization on linear regression models. The implementation is divided into three main parts:

Kernel Support Vector Classifier (SVC): Implementation of Linear, RBF, and Polynomial kernels on the XOR dataset.
Kernel K-Means Clustering: Implementation of standard and kernelized K-Means clustering on the XOR dataset.
Linear Regression with Regularization: Implementation of OLS, Ridge (L2), and Lasso (L1) regression on UCI datasets (Diabetes).

Project Structure

.
├── code.ipynb              
├── dataset
└── README.md

Methodology

Part 1: Kernel Support Vector Classifier (SVC)

Dataset: XOR (dataset_3)

Features: 2-dimensional
Labels: Binary {-1, 1}
Split: 70% Train / 30% Test
Runs: 10 (mean reported)

RBF Kernel achieved the best overall performance due to its ability to map XOR data into a higher-dimensional space where it becomes linearly separable.
Linear Kernel performed poorly (~50% accuracy) as expected for non-linearly separable XOR data. Polynomial d=3 showed competitive performance with RBF, demonstrating that higher-degree polynomials can capture non-linear boundaries.
Computational Cost: RBF kernel required significantly more time (~380s) compared to linear/poly kernels (~7s) due to Gram matrix computation.

Part 2: Kernel K-means Clustering

Configuration

Clusters (k): 2 (matching binary labels)
Initialization: K-means++ style
Evaluation: Cluster labels mapped to ground truth via Hungarian algorithm

Kernel K-means with RBF kernel effectively separated the XOR clusters by implicitly mapping data to a higher-dimensional space.
Standard K-means failed as expected due to linear decision boundaries in original feature space.
Cluster-label mapping via Hungarian algorithm ensured fair evaluation against ground truth.

Part 3: Linear Regression with Regularization

Dataset: Diabetes (UCI)

Features: 10 clinical variables
Target: Disease progression (continuous)
Preprocessing: Standardization (zero mean, unit variance)

Ridge Regression benefited significantly from standardization (9.3% MSE reduction) because L2 penalty is scale-sensitive.
Lasso showed minimal change; coordinate descent is somewhat robust to feature scaling but still benefits from it.
OLS unaffected by scaling (closed-form solution is scale-invariant).
Sparsity: Lasso with α=100 produced sparse coefficients, effectively performing feature selection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kernels & Regularization

Project Structure

Methodology

Part 1: Kernel Support Vector Classifier (SVC)

Part 2: Kernel K-means Clustering

Part 3: Linear Regression with Regularization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
README.md		README.md
code.ipynb		code.ipynb

Folders and files

Latest commit

History

Repository files navigation

Kernels & Regularization

Project Structure

Methodology

Part 1: Kernel Support Vector Classifier (SVC)

Part 2: Kernel K-means Clustering

Part 3: Linear Regression with Regularization

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages