NebTorch is a minimal Autograd engine built from scratch using NumPy , inspired by PyTorch’s automatic differentiation system.
In 11-785: Introduction to Deep Learning , a graduate-level course at CMU taught by Prof. Bhiksha Raj Ramakrishnan , I completed a sequence of assignments covering everything from foundational concepts to advanced topics in Deep Learning — including neural networks , optimizations , and more. The course provided both theoretical and practical understanding of neural networks, along with a brief introduction to Autograd.
After completing the course, I was inspired to dive deeper and build my own Autograd engine from scratch. Building NebTorch has been very rewarding—I’ve solidified my understanding of Deep Learning and Automatic Differentiation , and most of all, I’ve gained appreciation for frameworks such as PyTorch and TensorFlow .
Here's a complete example demonstrating how to use NebTorch to train a simple Multi-Layer Perceptron (MLP) on the Iris dataset:
import numpy as np
from nebtorch import Module , Tensor
from nebtorch .nn import Linear , ReLU , CrossEntropyLoss , Softmax
from nebtorch .optim import SGD
from sklearn import datasets
from sklearn .model_selection import train_test_split
class MLP (Module ):
def __init__ (self , in_features : int , out_features : int ):
super ().__init__ ()
self .linear_1 = Linear (in_features = in_features , out_features = 256 )
self .act = ReLU ()
self .linear_2 = Linear (in_features = 256 , out_features = out_features )
def forward (self , input : Tensor ):
out = self .linear_1 (input )
out = self .act (out )
logits = self .linear_2 (out )
return logits
# Load and prepare data
iris = datasets .load_iris ()
X = iris .data
Y = iris .target
X_train , X_test , Y_train , Y_test = train_test_split (
X , Y , test_size = 0.2 , random_state = 42
)
# Convert to NebTorch tensors
X_train = nebtorch .tensor (X_train )
Y_train = nebtorch .tensor (Y_train )
X_test = nebtorch .tensor (X_test )
Y_test = nebtorch .tensor (Y_test )
# Hyperparameters
INPUT_FEATURES = X_train .shape [1 ]
NUM_CLASSES = np .max (Y ) + 1
EPOCHS = 100
BATCH_SIZE = 5
# Initialize model, loss, and optimizer
model = MLP (INPUT_FEATURES , NUM_CLASSES )
criterion = CrossEntropyLoss ()
optimizer = SGD (model .parameters (), lr = 0.01 )
num_batches = X_train .shape [0 ] // BATCH_SIZE
for epoch in range (EPOCHS ):
for i in range (num_batches ):
model .train ()
optimizer .zero_grad ()
# Get batch
start_idx = i * BATCH_SIZE
end_idx = start_idx + BATCH_SIZE
input = X_train [start_idx :end_idx ]
target = Y_train [start_idx :end_idx ]
# Forward pass
out = model (input )
loss = criterion (out , target )
# Backward pass
loss .backward ()
optimizer .step ()
# Print progress
if epoch % 10 == 0 :
print (f"Epoch { epoch :3d} | Loss: { loss .data .item ():.4f} " )
# Evaluate on test set
model .eval ()
out = model (X_test )
loss = criterion (out , Y_test )
# Calculate accuracy
softmax = Softmax (dim = 1 )
predictions = np .argmax (softmax (out ).data , axis = 1 )
accuracy = np .sum (predictions == Y_test .data ) / Y_test .shape [0 ] * 100
print (f"Test Accuracy: { accuracy :.2f} %" )
Component
Description
Module
Base class for all neural network modules
Tensor
Multi-dimensional datastructure with automatic differentiation support
Parameter
Special tensor for trainable model parameters
Optimizer
Base class for all optimizers
Component
Description
Add
Element-wise addition with broadcasting
Subtract
Element-wise subtraction with broadcasting
Negate
Element-wise negation
Multiply
Element-wise multiplication with broadcasting
Divide
Element-wise division with broadcasting
Matrix Multiplication
Matrix multiplication (@ operator)
Transpose
Matrix transposition
Reshape
Tensor reshaping
Log
Natural logarithm
Exp
Exponential function
Power
Element-wise power operation
Mean
Mean reduction with axis support
Variance
Variance reduction with axis support
Sum
Sum reduction with axis support
Max
Maximum reduction with axis support
Slice
Tensor indexing and slicing
Component
Description
Sigmoid
Sigmoid activation function
Tanh
Hyperbolic tangent activation
ReLU
Rectified Linear Unit
GELU
Gaussian Error Linear Unit
Softmax
Softmax with dimension support
Component
Description
Linear
Fully connected layer
Conv1d_stride1
1D convolution with stride 1
Conv2d_stride1
2D convolution with stride 1
Conv2d
2D convolution with configurable stride
MaxPool2d_stride1
2D max pooling with stride 1
MeanPool2d_stride1
2D mean pooling with stride 1
MaxPool2d
2D max pooling with configurable stride
MeanPool2d
2D mean pooling with configurable stride
BatchNorm1d
1D batch normalization
LayerNorm
Layer normalization
Dropout
Dropout regularization
Embedding
Embedding layer for sparse inputs
Recurrent Neural Networks
Component
Description
RNNCell
Recurrent neural network cell
GRUCell
Gated Recurrent Unit cell
Component
Description
Upsampling1d
1D upsampling
Downsample1d
1D downsampling
Upsample2d
2D upsampling
Downsample2d
2D downsampling
Component
Description
MultiheadAttention
Multi-head attention mechanism
Scaled Dot-Product Attention
Scaled dot-product attention
Component
Description
Loss
Base class for all loss functions
MSELoss
Mean Squared Error loss
CrossEntropyLoss
Cross-entropy loss with softmax
Component
Description
SGD
Stochastic Gradient Descent