Skip to content

elhamabedi/kernel-regularization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Kernels & Regularization

This repository contains the implementation for Machine Learning Homework, focusing on advanced machine learning techniques including Kernel Methods (Support Vector Classifier, Kernel K-Means) and Regularization Techniques (Ridge & Lasso Regression). The project involves implementing algorithms from scratch (using Python) while utilizing standard libraries for data handling and evaluation.

This project explores the power of kernel methods for non-linearly separable data and the impact of regularization on linear regression models. The implementation is divided into three main parts:

  1. Kernel Support Vector Classifier (SVC): Implementation of Linear, RBF, and Polynomial kernels on the XOR dataset.
  2. Kernel K-Means Clustering: Implementation of standard and kernelized K-Means clustering on the XOR dataset.
  3. Linear Regression with Regularization: Implementation of OLS, Ridge (L2), and Lasso (L1) regression on UCI datasets (Diabetes).

Project Structure

.
├── code.ipynb              
├── dataset
└── README.md 

Methodology

Part 1: Kernel Support Vector Classifier (SVC)

Dataset: XOR (dataset_3)

  • Features: 2-dimensional
  • Labels: Binary {-1, 1}
  • Split: 70% Train / 30% Test
  • Runs: 10 (mean reported)
  1. RBF Kernel achieved the best overall performance due to its ability to map XOR data into a higher-dimensional space where it becomes linearly separable.
  2. Linear Kernel performed poorly (~50% accuracy) as expected for non-linearly separable XOR data. Polynomial d=3 showed competitive performance with RBF, demonstrating that higher-degree polynomials can capture non-linear boundaries.
  3. Computational Cost: RBF kernel required significantly more time (~380s) compared to linear/poly kernels (~7s) due to Gram matrix computation.

Part 2: Kernel K-means Clustering

Configuration

  • Clusters (k): 2 (matching binary labels)
  • Initialization: K-means++ style
  • Evaluation: Cluster labels mapped to ground truth via Hungarian algorithm
  1. Kernel K-means with RBF kernel effectively separated the XOR clusters by implicitly mapping data to a higher-dimensional space.
  2. Standard K-means failed as expected due to linear decision boundaries in original feature space.
  3. Cluster-label mapping via Hungarian algorithm ensured fair evaluation against ground truth.

Part 3: Linear Regression with Regularization

Dataset: Diabetes (UCI)

  • Features: 10 clinical variables
  • Target: Disease progression (continuous)
  • Preprocessing: Standardization (zero mean, unit variance)
  1. Ridge Regression benefited significantly from standardization (9.3% MSE reduction) because L2 penalty is scale-sensitive.
  2. Lasso showed minimal change; coordinate descent is somewhat robust to feature scaling but still benefits from it.
  3. OLS unaffected by scaling (closed-form solution is scale-invariant).
  4. Sparsity: Lasso with α=100 produced sparse coefficients, effectively performing feature selection.

Releases

No releases published

Packages

 
 
 

Contributors