This repository contains the implementation for Machine Learning Homework, focusing on advanced machine learning techniques including Kernel Methods (Support Vector Classifier, Kernel K-Means) and Regularization Techniques (Ridge & Lasso Regression). The project involves implementing algorithms from scratch (using Python) while utilizing standard libraries for data handling and evaluation.
This project explores the power of kernel methods for non-linearly separable data and the impact of regularization on linear regression models. The implementation is divided into three main parts:
- Kernel Support Vector Classifier (SVC): Implementation of Linear, RBF, and Polynomial kernels on the XOR dataset.
- Kernel K-Means Clustering: Implementation of standard and kernelized K-Means clustering on the XOR dataset.
- Linear Regression with Regularization: Implementation of OLS, Ridge (L2), and Lasso (L1) regression on UCI datasets (Diabetes).
.
├── code.ipynb
├── dataset
└── README.md
Dataset: XOR (dataset_3)
- Features: 2-dimensional
- Labels: Binary {-1, 1}
- Split: 70% Train / 30% Test
- Runs: 10 (mean reported)
- RBF Kernel achieved the best overall performance due to its ability to map XOR data into a higher-dimensional space where it becomes linearly separable.
- Linear Kernel performed poorly (~50% accuracy) as expected for non-linearly separable XOR data. Polynomial d=3 showed competitive performance with RBF, demonstrating that higher-degree polynomials can capture non-linear boundaries.
- Computational Cost: RBF kernel required significantly more time (~380s) compared to linear/poly kernels (~7s) due to Gram matrix computation.
Configuration
- Clusters (k): 2 (matching binary labels)
- Initialization: K-means++ style
- Evaluation: Cluster labels mapped to ground truth via Hungarian algorithm
- Kernel K-means with RBF kernel effectively separated the XOR clusters by implicitly mapping data to a higher-dimensional space.
- Standard K-means failed as expected due to linear decision boundaries in original feature space.
- Cluster-label mapping via Hungarian algorithm ensured fair evaluation against ground truth.
Dataset: Diabetes (UCI)
- Features: 10 clinical variables
- Target: Disease progression (continuous)
- Preprocessing: Standardization (zero mean, unit variance)
- Ridge Regression benefited significantly from standardization (9.3% MSE reduction) because L2 penalty is scale-sensitive.
- Lasso showed minimal change; coordinate descent is somewhat robust to feature scaling but still benefits from it.
- OLS unaffected by scaling (closed-form solution is scale-invariant).
- Sparsity: Lasso with α=100 produced sparse coefficients, effectively performing feature selection.