ML-Toolbox

Image source: https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote01_MLsetup.html

About

Classical machine learning and optimization methods are truly fascinating. The motivation behind this project is to explore these foundational techniques in depth. The repository is currently under development, and given the vast breadth of machine learning algorithms and optimization methods, it is likely to remain a work in progress for a long time.

File Structure

ML-Toolbox/
│
├── 📂 supervised-learning/       # Supervised Learning
│   ├── 📂 perceptron*
│   ├── 📂 knn*
│   ├── 📂 naive-bayes*
│   ├── 📂 logistic-regression*
│   ├── 📂 linear-regression*
│   ├── 📂 decision-trees*
│   ├── 📂 kd-ball-trees*
│   ├── 📂 svm*
│   ├── 📂 gaussian-processes*
│
├── 📂 ensemble-learning/         # Ensemble Methods
│   ├── 📂 bagging*
│   ├── 📂 boosting*
│   └── 📂 random-forests*
│
├── 📂 kernel-methods             # Kernels
│       ├── 📂 perceptron*
│       ├── 📂 linear-regression*
│       └── 📂 svm*
│
├── 📂 unsupervised-learning/     # Unsupervised Learning
│   ├── 📂 k-means*
│   ├── 📂 gaussian-mixture-models*
│   ├── 📂 kernel-density-estimation*
│   ├── 📂 pca*
│   └── 📂 apriori-algorithm*
│
├── 📂 deep-learning/             # Deep Learning
│   ├── 📂 neural-networks*
│   ├── 📂 cnn*
│   ├── 📂 rnn*
│   ├── 📂 autoencoders*
│   └── 📂 variational-autoencoders*
│
├── 📂 optimization-methods/      # Optimization Techniques
│   ├── 📂 unconstrained/
│   │   ├── gradient-descent*
│   │   ├── newtons-method*
│   │   ├── quasi-newton-method*
│   │   ├── coordinate-descent*
│   │   └── conjugate-gradient*
│   │
│   ├── 📂 constrained/
│
├── 📂 assets/
│   ├── 📂 data/                  # Datasets
│   ├── 📂 img/                   # Images & visual assets
│   └── 📂 scripts/               # Preprocessing scripts
│
└── 📄 README.md

Note: * indicates work in progress.

ML Philosophy

If we had complete knowledge about a problem, we could directly write down a formula or an algorithm to solve it. For example, finding the shortest path between two points in a graph. On the other hand, if we had complete data, solving the problem would be as simple as looking up the answer. For instance, finding a place on a map is just a lookup task. Here, the main challenge is choosing the right way to organize data. In such settings, problems are solved using data structures and algorithms (DSA-Toolbox).

Machine learning can be thought of as a hybrid approach that combines knowledge and data to solve problems. In this regime, we don’t have complete knowledge or complete data. Consider the task of prediction. If we know that the problem follows a linear trend, we can model it using a straight line y = mx + c. We can then use data to find the unknown parameters m and c that make the line fit the data best. This would become linear regression.

Knowledge can be combined with data in many ways. Mainly, there are three different places where we can inject knowledge.

Image source: https://gpss.cc/gpss24/slides/Ek2024.pdf

Model & Design Choices

Given a problem, our goal is to find the optimal solution h*. Since we cannot search over all possible solutions, we restrict ourselves to a class of models through design choices. These choices include how we model the problem, what assumptions we make, how we formulate our loss function, etc. These choices reflect what we already believe about the problem or what we want from the solution.

For example, consider classification. If we only care about accuracy, a Support Vector Machine (SVM) can be a good choice. But if we want probability outputs instead of just labels, models like Naive Bayes or Logistic Regression are better. Between these two, if we have very little data, Naive Bayes is often preferred. It makes explicit assumptions about data distribution (e.g., Gaussian Naive Bayes), and if those assumptions are close to reality, it works well even with limited data. Logistic Regression makes fewer assumptions and is more flexible. However, because of this, it usually needs more data to perform well.

Another example is regularization. L2 regularization is useful when we want smoother and simpler models. L1 regularization is helpful when we want the solution to be sparse.

All these choices define the gap between h* and h_opt, where h_opt is the best model that can be produced based on our design choices.

Data

Data is the second place where we can inject knowledge. h_opt is the best solution we would get if we had perfect or unlimited data. But with limited or biased data, we end up with ĥ_opt instead. One way to improve this is to collect more data. Another way is to use knowledge about the problem itself.

For example, if all training images have bright backgrounds, the model may not work well on images with darker backgrounds. In this case, we can use data augmentation to include different lighting conditions during training. This can help reduce the gap between h_opt and ĥ_opt.

Optimization

The third place where we can inject knowledge is the way we optimize the model. For some problems, like total variation denoising, the ADMM algorithm may perform better than gradient descent. In other cases, the Newton method may reach the solution in a few steps, while gradient descent may take much longer. Another example of optimization is the choice of hyperparameters. This includes the learning rate, batch size, constants, etc. ĥ_opt represents the best solution achievable with ideal optimization, and ĥ represents the solution we actually obtain.

However, with great power comes great responsibility. In all three components, if our assumptions do not match reality, they will increase the error instead of reducing it.

Getting Started

The datasets are either public datasets from libraries (such as Fashion-MNIST) or datasets downloaded from Kaggle. Details about each dataset used can be found in assets/data/README.md. The datasets can be downloaded from their original sources or from this link. Once downloaded, the datasets must be placed in the /assets/data folder.

Major References

Cornell CS4780 Machine Learning for Intelligent Systems by Prof. Kilian Weinberger.
CS7.403 Statistical Methods in Artificial Intelligence course by IIIT Hyderabad.
Gaussian Process Summer School 2024.

Other References

MIT 6.036 Machine Learning by Prof. Tamara Broderick.
Bias Variance Tradeoff by MIT OpenCourseware and The Stanford NLP Group.
Kernel Methods in Computer Vision by Prof. Christoph Lampert, and Notes on Lagrangian multiplier and KKT.
Neural Networks and Deep Learning Online Book by Michael Nielsen.
Talk on Association Rule Mining by Prof. Ami Gates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-Toolbox

About

File Structure

ML Philosophy

Getting Started

Major References

Other References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
assets		assets
deep-learning		deep-learning
ensemble-learning		ensemble-learning
kernel-methods		kernel-methods
optimization-methods		optimization-methods
supervised-learning		supervised-learning
unsupervised-learning		unsupervised-learning
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ML-Toolbox

About

File Structure

ML Philosophy

Getting Started

Major References

Other References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages