| NOTE |
|---|
| There is a newer version of this toolkit, which addresses most of the shortcomings of this version. Please refer to the new tutorial & examples here |
Jump straight to the step-by-step tutorial
Pytorch, with it's Pythonic interface and dynamic graph evaluation makes it much easier to debug your deep learning models compared with Tensorflow 1.0. However, on the flip side, it's still a low level library, which requires us to (often) write the same (or almost same) boilerplate code to train, evaluate & test our model. Some ML authors love to code this way as they feel in control of what they create.
Me? I hate writing the same boilerplate code repeatedly. I'd rathen have my library handle the mundane tasks while I
focus on the tasks that really matter viz. designing my model architecture, preparing data for my model & tuning the
hyper-parameters to get the best performance. Keras does a remarkable job of providing a clean & simple API to train,
evaluate & test models. With the Pytorch Toolkit I aim to bring the ease that Keras provides to Pytorch. Most of the
API is similar to the Keras API, so Keras users should find it very easy to understand.
- Keras-like API to train model and evaluate model performance (e.g.
fit()andevaluate()) and make predictions ( e.g.predict()) - Support for torchvision datasets - with
fit_dataset(),evaluate_dataset()andpredict_module()calls - Convenience class
PytkModulefrom which to extend your model - this class provides the Keras-like API - Full support for using
nn.SequentialAPI, via thePytkModuleWrapperclass, which also provides the same functions listed above - Support for saving and loading model states with
save()andload()methods - Support for a variety of metrics (like accuracy, f1-score, precision, recall), calculated for each training epoch
- Keras-like progress display while model trains
- Support for Early Stopping of training
I must confess, I'm not good at coming up with snazzy names for the libraries I create. I tried several acronyms and
finally settled on a rather unfancy name Pytorch Toolkit (or PyTk). If you can come up with a really cool name,
please let me know!
This Github repository includes the tooklit, along with several examples on how to use it. Also included is
a step-by-step tutorial which gradually introduces you to the complete API included in the toolkit. All
functions & classes are included in just 1 Python file (ingeniously named pytorch_toolkit.py - I did warn you that I
am not good at coming up with names, didn't I?). I have not yet created a module - maybe someday...
Since all functions & classes are included in just 1 file - pytorch_toolkit.py, there are strictly no special
installation steps required to use this toolkit.
-
Clone this repository - so you get entire Pytorch Tooklit and several example files
-
Copy the
pytorch_toolkit.pyfile into your project's directory and you are done! (Alternately, if you don't like several copies scattered across your disk drive, copy this file to any one folder in your Python SYSPATH) -
At the top of your code file (or Jupyter Notebook) and after all your other imports, enter the following code to import the toolkit into your project - I use the
pytkalias - you can use whatever you prefer.import pytorch_tooklit as pytk
-
This library depends on several other Python libraries, viz:
- Numpy
- pandas
- pathlib
- matplotlib
- seaborn
- scikit-learn
- Pytorch (of course!)
- torchsummary - if you want to see a Keras-like summary of your model (optional!)
I am assuming that you have these installed already - if you are an aspiring Data Scientist or ML enthusiast, you would have these (except perhaps Pytorch & torchsummary). Please refer to the respective module documentation on how to install these libraries.
-
I assume you have followed the instructions above & have installed the pre-requisites, including Pytorch itself.
-
Start with a new Python code file (or Jupyter notebook) - add the following line at the top (after all your other imports):
# ... your imports including Pytorch imports import pytorch_tooklit as pytk
-
Just run this file/ code-cell - if you don't get any import errors, you are done - celebrate your success!! :)
- Should you get any import errors, please correct them by installing the respective module/library (Pytorch Tooklit dependencies are mentioned above)
- Repeat the above step (running file) until you get no errors
This tookkit was inspired by Keras' clean API to train, evaluate and test models. Much of the functions provided mimick those from the Keras API. If you are already using Keras, you should notice the similarities immmediately.
The toolkit provides:
- A custom class -
PytkModulefrom which your custom models should derive - Keras-like API to train your models -
fit(...),fit_dataset(...)functions and ashow_plots(...)function to plotlossandaccuracymetric across epochs, so you can see if your model is overfitting or underfitting. - Several pre-defined metrics, like Accuracy, F1-Score, MSE, MAE etc., which can be tracked during training.
- Keras-like API to evaluate model's performance post training -
evaluate(...)andavaluate_dataset(...)functions - Functions to save & load model's state -
pytk.load()andsave(...)functions. - Keras-like API to run predictions -
predict(...)andpredict_module(...)calls. - Early stopping of training based on several criteria (e.g. validation loss not improving)
I am also including several fully working examples (as Python files or as Jupyter notebooks), where I have applied this API to solve several ML problems. The step-by-step tutorial will refer to one or more of these examples. I'll be adding more example to this Github repository, so please check back for changes.
If you are excited about starting with the Pytorch Toolkit, jump to the step-by-step tutorial right away!
pyt_breast_cancer.py- binary classification on theWisconsin Breast Cancer datasetusing Pytorch ANNpyt_iris.py- multiclass classification ofscikit-learn Iris datasetusing Pytorch ANNpyt_wine.py- multiclass classification ofscikit-learn Winedatasetpyt_mnist_dnn.py-MNIST digitsmulticlass classification with Pytorch ANNpyt_cifar10_cnn.py- multiclass classification with CNN onCIFAR-10datasetPytorch-Fruits360(Kaggle)_CNN.ipynb- iPython Notebook for the Kaggle Fruits360 multiclass classification problemPytorch-Malaria Cell Detection(Kaggle)_CNN.ipynb- iPython Notebook for the Kaggle Malaric Cell Detection Dataset binary classification problem
pyt_regression.py- univariate regression on synthesized datapyt_salary_regression.py- multivariate regression on salary data (@see csv_filed/salary_data.csv)
I will be adding mode examples over the course of time. Keep watching this repository :).
Hope you enjoy using the Pytorch Toolkit - my small contribution to the Pytorch community. Feedback is welcome.