Backpropagation From Scratch — Deep Learning

A ground-up implementation of backpropagation and three gradient-based optimizers on a custom computational graph, built as an educational deep learning project. Every gradient is derived analytically using the chain rule, verified against numerical approximation via gradient checking, and benchmarked across Vanilla SGD, Momentum SGD, and Adam.

Project Structure

BACKPROPOGATION-FROM-SCRATCH---Deep-Learning/
├── Backpropogation_deep_learning.ipynb   # Full implementation & results
├── requirements.txt
└── README.md

Problem Setup

Task: Regression on a dataset of 506 samples with 5 features (f1–f5). Goal: Learn 9 weights (w1–w9) by minimizing MSE loss using manual backpropagation.

Computational Graph

The network is split into 3 sub-graphs for clarity:

Part 1 (Exponential branch):
  z1      = w0*x0 + w1*x1
  part1   = exp(z1² + w5)

Part 2 (Tanh branch):
  part2   = tanh(part1 + w6)

Part 3 (Sigmoid branch):
  z2      = sin(w2*x2)
  z3      = w3*x3 + w4*x4
  part3   = sigmoid(z2 * z3 + w7)

Output:
  y_pred  = part3 * w8 + part2

Loss (MSE per sample):
  L       = (y - y_pred)²

Weight initialization: Normal distribution, mean=0, std=0.01

Implementation

Task 1 — Forward & Backward Propagation

Forward Pass

Computes all intermediate activations through the graph
Stores values needed for the backward pass
Calculates per-sample MSE loss and the seed gradient ∂L/∂y_pred = -2(y - y_pred)

Backward Pass (Manual Chain Rule)

Derives all 9 gradients analytically:

Gradient	Expression summary
dw0	∂L/∂y_pred · 1 · ∂tanh · ∂exp · 2z1 · x0
dw1	∂L/∂y_pred · 1 · ∂tanh · ∂exp · 2z1 · x1
dw2	∂L/∂y_pred · w8 · ∂sigmoid · z3 · cos(w2·x2) · x2
dw3	∂L/∂y_pred · w8 · ∂sigmoid · z2 · x3
dw4	∂L/∂y_pred · w8 · ∂sigmoid · z2 · x4
dw5	∂L/∂y_pred · 1 · ∂tanh · ∂exp
dw6	∂L/∂y_pred · 1 · ∂tanh
dw7	∂L/∂y_pred · w8 · ∂sigmoid
dw8	∂L/∂y_pred · part3

Gradient Checking

Validates analytical gradients against numerical approximation:

f'(x) ≈ (f(x+ε) − f(x−ε)) / (2ε),   ε = 1e-7

Relative error = |dW − dW_approx| / (|dW| + |dW_approx|)

All gradients passed with relative error < 1e-3 (max observed: −2.55e-04).

Task 2 — Optimizer Comparison (20 Epochs, 506 Samples)

2.1 Vanilla SGD

w_t+1 = w_t − η · ∇w

Learning rate η = 0.05

2.2 Momentum SGD

v_t   = γ · v_{t-1} + η · ∇w
w_t+1 = w_t − v_t

Momentum γ = 0.9, Learning rate η = 0.05

2.3 Adam (Adaptive Moment Estimation)

m_t     = β1 · m_{t-1} + (1−β1) · ∇w
v_t     = β2 · v_{t-1} + (1−β2) · ∇w²
m̂_t    = m_t / (1−β1^t)
v̂_t    = v_t / (1−β2^t)
w_t+1  = w_t − α · m̂_t / (√v̂_t + ε)

β1 = 0.9, β2 = 0.99, α = 0.001, ε = 1e-4

Results

Optimizer	Loss @ Epoch 1	Loss @ Epoch 20
Vanilla SGD	0.0423	3.21e-04
Momentum SGD	0.0430	1.67e-05
Adam	0.4663	1.62e-06

Key observations:

Adam achieves the lowest final loss (1.62e-06) despite a higher initial loss, due to adaptive learning rates.
Momentum converges ~19x faster than Vanilla SGD.
All three show monotonic loss decrease; Adam's decay appears exponential on a log scale.

Grader Validation

All tasks passed built-in grader checks:

Sigmoid: output = 0.8807970779778823 for z = 2
Forward pass: loss = 0.9298, dy_pred = −1.9285
Backward pass: all 9 gradients verified to 6 decimal places
Gradient checking: max relative error < 1e-3

Running the Notebook

# Clone the repo
git clone https://github.com/<your-username>/BACKPROPOGATION-FROM-SCRATCH---Deep-Learning.git
cd BACKPROPOGATION-FROM-SCRATCH---Deep-Learning

# Install dependencies
pip install -r requirements.txt

# Launch notebook
jupyter notebook Backpropogation_deep_learning.ipynb

The notebook was originally developed on Google Colab. The google.colab import is only needed for file upload; remove it if running locally.

Concepts Covered

Computational graphs & directed acyclic graph (DAG) traversal
Chain rule for composite functions (exp, tanh, sigmoid, sin)
Numerical gradient checking as a debugging tool
SGD, Momentum, and Adam optimizers from first principles
Effect of adaptive learning rates on convergence speed

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Backpropogation_deep_learning.ipynb		Backpropogation_deep_learning.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Backpropagation From Scratch — Deep Learning

Project Structure

Problem Setup

Computational Graph

Implementation

Task 1 — Forward & Backward Propagation

Forward Pass

Backward Pass (Manual Chain Rule)

Gradient Checking

Task 2 — Optimizer Comparison (20 Epochs, 506 Samples)

2.1 Vanilla SGD

2.2 Momentum SGD

2.3 Adam (Adaptive Moment Estimation)

Results

Grader Validation

Running the Notebook

Concepts Covered

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Backpropagation From Scratch — Deep Learning

Project Structure

Problem Setup

Computational Graph

Implementation

Task 1 — Forward & Backward Propagation

Forward Pass

Backward Pass (Manual Chain Rule)

Gradient Checking

Task 2 — Optimizer Comparison (20 Epochs, 506 Samples)

2.1 Vanilla SGD

2.2 Momentum SGD

2.3 Adam (Adaptive Moment Estimation)

Results

Grader Validation

Running the Notebook

Concepts Covered

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages