Anthropic publishes research aimed at understanding the internals of LLMs in the Transformer Circuits Thread. This repository contains code to reproduce the experiments from Section 3, Superposition as a Phase Change, of Toy Models of Superposition.
This implementation is based on the official repository toy-models-of-superposition and the Section 3 repository superposition; we thank the Anthropic team and Martin Wattenberg for generously open-sourcing their work.
$ uv sync --lockedRun empirical_version.py to train separately for each n_features.
In the original paper, n_features was set to 2 or 3.
$ uv run empirical_version.py --n_features 2
$ uv run empirical_version.py --n_features 3$ CUDA_VISIBLE_DEVICES=0,1,2,3
$ uv run torchrun --nproc_per_node=4 empirical_version.py --n_features 2
$ uv run torchrun --nproc_per_node=4 empirical_version.py --n_features 3CSV files are written under the output/ directory for each value of n_features.
Because the above commands can take a long time, sample results are already stored in the output/ directory.
Open Superposition_as_a_Phase_Change.ipynb and run the cells.
In short, the current implementation does not perfectly reproduce the results from the original paper.
Below are results for n_features = 2 and n_features = 3.
| n_features = 2 | n_features = 3 |
|---|---|
![]() |
![]() |
This may be due to differences in training hyperparameters such as learning rate or number of training steps.
However, empirical_version.py includes parallelization to make experimentation more efficient, and Superposition_as_a_Phase_Change.ipynb both partially reproduces the original paper’s results and provides a more detailed theoretical explanation, so this repository may still be useful.
If you obtain better results or notice anything to fix, feel free to open an issue.

