Skip to content

kshitijbhandari/BACKPROPOGATION-FROM-SCRATCH---Deep-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Backpropagation From Scratch — Deep Learning

A ground-up implementation of backpropagation and three gradient-based optimizers on a custom computational graph, built as an educational deep learning project. Every gradient is derived analytically using the chain rule, verified against numerical approximation via gradient checking, and benchmarked across Vanilla SGD, Momentum SGD, and Adam.


Project Structure

BACKPROPOGATION-FROM-SCRATCH---Deep-Learning/
├── Backpropogation_deep_learning.ipynb   # Full implementation & results
├── requirements.txt
└── README.md

Problem Setup

Task: Regression on a dataset of 506 samples with 5 features (f1–f5). Goal: Learn 9 weights (w1–w9) by minimizing MSE loss using manual backpropagation.

Computational Graph

The network is split into 3 sub-graphs for clarity:

Part 1 (Exponential branch):
  z1      = w0*x0 + w1*x1
  part1   = exp(z1² + w5)

Part 2 (Tanh branch):
  part2   = tanh(part1 + w6)

Part 3 (Sigmoid branch):
  z2      = sin(w2*x2)
  z3      = w3*x3 + w4*x4
  part3   = sigmoid(z2 * z3 + w7)

Output:
  y_pred  = part3 * w8 + part2

Loss (MSE per sample):
  L       = (y - y_pred)²

Weight initialization: Normal distribution, mean=0, std=0.01


Implementation

Task 1 — Forward & Backward Propagation

Forward Pass

  • Computes all intermediate activations through the graph
  • Stores values needed for the backward pass
  • Calculates per-sample MSE loss and the seed gradient ∂L/∂y_pred = -2(y - y_pred)

Backward Pass (Manual Chain Rule)

Derives all 9 gradients analytically:

Gradient Expression summary
dw0 ∂L/∂y_pred · 1 · ∂tanh · ∂exp · 2z1 · x0
dw1 ∂L/∂y_pred · 1 · ∂tanh · ∂exp · 2z1 · x1
dw2 ∂L/∂y_pred · w8 · ∂sigmoid · z3 · cos(w2·x2) · x2
dw3 ∂L/∂y_pred · w8 · ∂sigmoid · z2 · x3
dw4 ∂L/∂y_pred · w8 · ∂sigmoid · z2 · x4
dw5 ∂L/∂y_pred · 1 · ∂tanh · ∂exp
dw6 ∂L/∂y_pred · 1 · ∂tanh
dw7 ∂L/∂y_pred · w8 · ∂sigmoid
dw8 ∂L/∂y_pred · part3

Gradient Checking

Validates analytical gradients against numerical approximation:

f'(x) ≈ (f(x+ε) − f(x−ε)) / (2ε),   ε = 1e-7

Relative error = |dW − dW_approx| / (|dW| + |dW_approx|)

All gradients passed with relative error < 1e-3 (max observed: −2.55e-04).


Task 2 — Optimizer Comparison (20 Epochs, 506 Samples)

2.1 Vanilla SGD

w_t+1 = w_t − η · ∇w
  • Learning rate η = 0.05

2.2 Momentum SGD

v_t   = γ · v_{t-1} + η · ∇w
w_t+1 = w_t − v_t
  • Momentum γ = 0.9, Learning rate η = 0.05

2.3 Adam (Adaptive Moment Estimation)

m_t     = β1 · m_{t-1} + (1−β1) · ∇w
v_t     = β2 · v_{t-1} + (1−β2) · ∇w²
m̂_t    = m_t / (1−β1^t)
v̂_t    = v_t / (1−β2^t)
w_t+1  = w_t − α · m̂_t / (√v̂_t + ε)
  • β1 = 0.9, β2 = 0.99, α = 0.001, ε = 1e-4

Results

Optimizer Loss @ Epoch 1 Loss @ Epoch 20
Vanilla SGD 0.0423 3.21e-04
Momentum SGD 0.0430 1.67e-05
Adam 0.4663 1.62e-06

Key observations:

  • Adam achieves the lowest final loss (1.62e-06) despite a higher initial loss, due to adaptive learning rates.
  • Momentum converges ~19x faster than Vanilla SGD.
  • All three show monotonic loss decrease; Adam's decay appears exponential on a log scale.

Grader Validation

All tasks passed built-in grader checks:

  • Sigmoid: output = 0.8807970779778823 for z = 2
  • Forward pass: loss = 0.9298, dy_pred = −1.9285
  • Backward pass: all 9 gradients verified to 6 decimal places
  • Gradient checking: max relative error < 1e-3

Running the Notebook

# Clone the repo
git clone https://github.com/<your-username>/BACKPROPOGATION-FROM-SCRATCH---Deep-Learning.git
cd BACKPROPOGATION-FROM-SCRATCH---Deep-Learning

# Install dependencies
pip install -r requirements.txt

# Launch notebook
jupyter notebook Backpropogation_deep_learning.ipynb

The notebook was originally developed on Google Colab. The google.colab import is only needed for file upload; remove it if running locally.


Concepts Covered

  • Computational graphs & directed acyclic graph (DAG) traversal
  • Chain rule for composite functions (exp, tanh, sigmoid, sin)
  • Numerical gradient checking as a debugging tool
  • SGD, Momentum, and Adam optimizers from first principles
  • Effect of adaptive learning rates on convergence speed

About

Manual implementation of backpropagation on a custom computational graph with gradient checking. Benchmarks Vanilla SGD, Momentum, and Adam optimizers from first principles using NumPy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors