Skip to content

aniervs/micrograd-lean

Repository files navigation

Micrograd-Lean

A tape-based reverse-mode automatic differentiation engine implemented in Lean 4, inspired by Andrej Karpathy's micrograd.

This project demonstrates:

  1. Software engineering in Lean — a complete autodiff engine with neural network layers, loss functions, and SGD training.
  2. Formal verification — Lean proofs that symbolic differentiation correctly implements the constant rule, sum rule, product rule, negation rule, and linearity.
  3. Machine learning — trains on XOR (100% accuracy) and MNIST handwritten digits (~87% test accuracy with a small network).

Training curves

XOR (2→8→8→2, tanh, 2000 epochs) MNIST (784→32→10, relu, 10 epochs)
XOR training MNIST training

How it Works

The engine uses a Wengert tape — a flat array where each entry stores an operation and its result. The forward pass appends nodes in topological order; the backward pass walks the tape in reverse, accumulating gradients.

For example, y = w·x + b with x=2, w=3, b=1:

graph BT
    n0["<b>#0 Const</b><br/>val = 2.0 &ensp; (x)<br/>grad = 3.0"]
    n1["<b>#1 Const</b><br/>val = 3.0 &ensp; (w)<br/>grad = 2.0"]
    n2["<b>#2 Const</b><br/>val = 1.0 &ensp; (b)<br/>grad = 1.0"]
    n3["<b>#3 Mul</b>(1,0)<br/>val = 6.0<br/>grad = 1.0"]
    n4["<b>#4 Add</b>(3,2)<br/>val = 7.0<br/>grad = 1.0"]

    n0 --> n3
    n1 --> n3
    n3 --> n4
    n2 --> n4
Loading

The tape is just #[node₀, node₁, node₂, node₃, node₄]. Each node only references earlier indices, so no sorting is needed — the array is the topological order.

The backward pass walks the tape in reverse, applying the chain rule at each node to accumulate gradients into a parallel array:

graph TD
    subgraph "i = 4 &ensp; Add(3,2) &ensp; grad[4] = 1.0"
        s4["grad[3] += 1.0 &ensp; grad[2] += 1.0"]
    end
    subgraph "i = 3 &ensp; Mul(1,0) &ensp; grad[3] = 1.0"
        s3["grad[1] += val[0] · 1.0 = 2.0<br/>grad[0] += val[1] · 1.0 = 3.0"]
    end
    subgraph "i = 2 &ensp; Const &ensp; grad[2] = 1.0"
        s2["(leaf — nothing to propagate)"]
    end
    subgraph "i = 1 &ensp; Const &ensp; grad[1] = 2.0"
        s1["(leaf)"]
    end
    subgraph "i = 0 &ensp; Const &ensp; grad[0] = 3.0"
        s0["(leaf)"]
    end

    s4 --> s3 --> s2 --> s1 --> s0
Loading

Result: grad[0] = 3.0 → ∂y/∂x = w = 3, and grad[1] = 2.0 → ∂y/∂w = x = 2.

Project Structure

Micrograd/
  Engine.lean              -- Tape-based reverse-mode autodiff
  NN.lean                  -- Layers, MLP, losses, SGD, RNG
  Data/
    Synthetic.lean         -- XOR and linear datasets
    MNIST.lean             -- IDX binary format loader
  Verified/
    Axioms.lean            -- Axiomatized Float field properties
    Expr.lean              -- Pure expression tree, eval, symbolic diff
    GradRules.lean         -- Derivative correctness proofs
Main.lean                  -- Unit tests, XOR training, MNIST training

Building and Running

Requires Lean 4 v4.28.0 (via elan).

lake build
lake exe micrograd

MNIST

Download MNIST data (requires curl and gunzip):

bash scripts/download_mnist.sh

Then re-run lake exe micrograd — MNIST training will run automatically.

Verified Derivative Rules

The Micrograd/Verified/ directory contains Lean proofs of derivative correctness for a pure expression tree (Expr). Proved theorems include:

Theorem Statement
grad_const d/dx(c) = 0
grad_var_self d/dx(x) = 1
grad_var_other d/dx(y) = 0 when y ≠ x
grad_add d/dx(f + g) = df/dx + dg/dx
grad_mul d/dx(f·g) = (df/dx)·g + f·(dg/dx)
grad_neg d/dx(-f) = -(df/dx)
grad_square' d/dx(x²) = x + x
grad_linear_combo d/dx(a·f + b·g) = a·(df/dx) + b·(dg/dx)

These proofs rely on axiomatized Float field properties (see Axioms.lean). The axioms are clearly documented as assumptions — IEEE 754 floats do not perfectly satisfy field axioms due to NaN, infinity, and rounding.

References

  • R.E. Wengert, "A simple automatic derivative evaluation program," Communications of the ACM, 7(8):463–464, August 1964. The original formulation of tape-based automatic differentiation.
  • A. Karpathy, micrograd. The Python autodiff engine that inspired this project.

About

A tape-based reverse-mode automatic differentiation engine implemented in Lean 4, inspired by Karpathy's micrograd

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors