A tape-based reverse-mode automatic differentiation engine implemented in Lean 4, inspired by Andrej Karpathy's micrograd.
This project demonstrates:
- Software engineering in Lean — a complete autodiff engine with neural network layers, loss functions, and SGD training.
- Formal verification — Lean proofs that symbolic differentiation correctly implements the constant rule, sum rule, product rule, negation rule, and linearity.
- Machine learning — trains on XOR (100% accuracy) and MNIST handwritten digits (~87% test accuracy with a small network).
| XOR (2→8→8→2, tanh, 2000 epochs) | MNIST (784→32→10, relu, 10 epochs) |
|---|---|
![]() |
![]() |
The engine uses a Wengert tape — a flat array where each entry stores an operation and its result. The forward pass appends nodes in topological order; the backward pass walks the tape in reverse, accumulating gradients.
For example, y = w·x + b with x=2, w=3, b=1:
graph BT
n0["<b>#0 Const</b><br/>val = 2.0   (x)<br/>grad = 3.0"]
n1["<b>#1 Const</b><br/>val = 3.0   (w)<br/>grad = 2.0"]
n2["<b>#2 Const</b><br/>val = 1.0   (b)<br/>grad = 1.0"]
n3["<b>#3 Mul</b>(1,0)<br/>val = 6.0<br/>grad = 1.0"]
n4["<b>#4 Add</b>(3,2)<br/>val = 7.0<br/>grad = 1.0"]
n0 --> n3
n1 --> n3
n3 --> n4
n2 --> n4
The tape is just #[node₀, node₁, node₂, node₃, node₄]. Each node only
references earlier indices, so no sorting is needed — the array is the
topological order.
The backward pass walks the tape in reverse, applying the chain rule at each node to accumulate gradients into a parallel array:
graph TD
subgraph "i = 4   Add(3,2)   grad[4] = 1.0"
s4["grad[3] += 1.0   grad[2] += 1.0"]
end
subgraph "i = 3   Mul(1,0)   grad[3] = 1.0"
s3["grad[1] += val[0] · 1.0 = 2.0<br/>grad[0] += val[1] · 1.0 = 3.0"]
end
subgraph "i = 2   Const   grad[2] = 1.0"
s2["(leaf — nothing to propagate)"]
end
subgraph "i = 1   Const   grad[1] = 2.0"
s1["(leaf)"]
end
subgraph "i = 0   Const   grad[0] = 3.0"
s0["(leaf)"]
end
s4 --> s3 --> s2 --> s1 --> s0
Result: grad[0] = 3.0 → ∂y/∂x = w = 3, and grad[1] = 2.0 → ∂y/∂w = x = 2.
Micrograd/
Engine.lean -- Tape-based reverse-mode autodiff
NN.lean -- Layers, MLP, losses, SGD, RNG
Data/
Synthetic.lean -- XOR and linear datasets
MNIST.lean -- IDX binary format loader
Verified/
Axioms.lean -- Axiomatized Float field properties
Expr.lean -- Pure expression tree, eval, symbolic diff
GradRules.lean -- Derivative correctness proofs
Main.lean -- Unit tests, XOR training, MNIST training
Requires Lean 4 v4.28.0 (via elan).
lake build
lake exe microgradDownload MNIST data (requires curl and gunzip):
bash scripts/download_mnist.shThen re-run lake exe micrograd — MNIST training will run automatically.
The Micrograd/Verified/ directory contains Lean proofs of derivative
correctness for a pure expression tree (Expr). Proved theorems include:
| Theorem | Statement |
|---|---|
grad_const |
d/dx(c) = 0 |
grad_var_self |
d/dx(x) = 1 |
grad_var_other |
d/dx(y) = 0 when y ≠ x |
grad_add |
d/dx(f + g) = df/dx + dg/dx |
grad_mul |
d/dx(f·g) = (df/dx)·g + f·(dg/dx) |
grad_neg |
d/dx(-f) = -(df/dx) |
grad_square' |
d/dx(x²) = x + x |
grad_linear_combo |
d/dx(a·f + b·g) = a·(df/dx) + b·(dg/dx) |
These proofs rely on axiomatized Float field properties (see Axioms.lean).
The axioms are clearly documented as assumptions — IEEE 754 floats do not
perfectly satisfy field axioms due to NaN, infinity, and rounding.
- R.E. Wengert, "A simple automatic derivative evaluation program," Communications of the ACM, 7(8):463–464, August 1964. The original formulation of tape-based automatic differentiation.
- A. Karpathy, micrograd. The Python autodiff engine that inspired this project.

