Investigating Trajectory-Conditioned Semantics and Dynamic Computational Pathways in GPT-2-Small

This repository contains the experimental codebase, datasets, and logs for our mechanistic interpretability study on Transformer Semantics and Dynamic Computational Pathways.

The code evaluates two major hypotheses:

The Forward Proportion Hypothesis: Semantic behavior is conditioned on the activation trajectory rather than solely the current state.
The Dynamic Computational Pathway Hypothesis: Meaning emerges dynamically through context-sensitive interaction topologies between attention heads.

Our findings falsify the dynamic rewiring of attention heads and instead support the Combinatorial Recruitment Model—where fixed-behavior attention heads communicate indirectly via a shared residual stream, and contextual semantics emerge from the combinatorial selection of active components. Furthermore, we isolate a 27.3% irreducible variance in output divergence that cannot be explained by single-position state similarity, pointing to limits in state-based semantic prediction.

🔬 Key Findings

Co-Activation Communities: Attention heads form strong, stable communities (Modularity $Q=0.536$) that distinctly separate symbolic tasks (coding/math) from natural language.
Topology Predicts Semantics: Interaction graph topology carries independent predictive power, improving semantic category prediction to 91.7% (vs 88.3% using activation magnitude alone).
No Dynamic Rewiring: Individual heads do not dynamically rewire across contexts (similarity ratio $\approx 1.002$). They exhibit stereotyped, rigid interaction patterns.
Direct Pathways are Insignificant: Targeted disruption of direct head-to-head attention yields only 3.5% of the effect of full pair ablation.
The 27.3% Mystery: Even using unembedding-projected distance metrics, 27.3% of the semantic divergence between convergent prompt trajectories remains unpredictable from single-position similarity, highlighting the complex nature of distributed, cross-position nonlinear processing.

📂 Codebase Structure

The research pipeline is divided into multi-phase iterations. The code is modular and built on top of TransformerLens.

Setup & Primitives

code/phase0_setup.py: Core utility functions, GPT-2-small loading, logger initialization, and metric definitions (Cosine, KL Divergence, Logit Lens).

Iteration One: Trajectories & Rings

code/iter1_...: Scripts handling the initial dataset creation, activation fingerprinting, community detection (Louvain), activation patching for the Forward Proportion test, and the falsification of Semantic Rings.

Iteration Two: Dynamic Pathways

code/iter2_phase1_interaction_graphs.py: Constructs $144 \times 144$ interaction matrices based on attention-weighted upstream activations.
code/iter2_phase2_pathway_prediction.py: Trains Logistic Regression classifiers (with 5-fold CV) comparing Activation Magnitude vs Interaction Topology.
code/iter2_phase3_context_sensitivity.py: Tests whether target heads alter their interaction topologies based on the semantic context.
code/iter2_phase4_pathway_perturbation.py: Performs causal interventions (Pair Ablation vs Pathway Disruption) to test the causal necessity of direct A $\rightarrow$ B circuits.

The Final Showdown

code/the_final_answer.py (The 42% Experiment): The definitive test resolving the unexplained variance. Compares Baseline Cosine, Post-LayerNorm Cosine, and Unembed-projected distance, and performs SVD directional decomposition of $W_U$.

⚙️ Installation & Setup

Clone the repository:

git clone https://github.com/yourusername/forward-proportion.git
cd forward-proportion

Install dependencies: This project requires Python 3.8+ and PyTorch.

pip install torch transformer-lens networkx scikit-learn scipy community-louvain matplotlib

Running the experiments: The codebase was originally designed for execution in Google Colab (T4 GPU recommended). You can run the scripts sequentially in a Jupyter Notebook environment or directly via Python:
```
python code/phase0_setup.py
python code/iter2_phase1_interaction_graphs.py
# ...
```

📜 Paper and Citations

The full theoretical framework, mathematical formulations, and detailed analysis can be found in the accompanying LaTeX manuscript located in Iteration One/research_paper.tex.

(Placeholder for Zenodo / arXiv link once published)

⚖️ License

This project is open-sourced under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Claude		Claude
Docs		Docs
Iteration One		Iteration One
code		code
png for rp		png for rp
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigating Trajectory-Conditioned Semantics and Dynamic Computational Pathways in GPT-2-Small

🔬 Key Findings

📂 Codebase Structure

Setup & Primitives

Iteration One: Trajectories & Rings

Iteration Two: Dynamic Pathways

The Final Showdown

⚙️ Installation & Setup

📜 Paper and Citations

⚖️ License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Investigating Trajectory-Conditioned Semantics and Dynamic Computational Pathways in GPT-2-Small

🔬 Key Findings

📂 Codebase Structure

Setup & Primitives

Iteration One: Trajectories & Rings

Iteration Two: Dynamic Pathways

The Final Showdown

⚙️ Installation & Setup

📜 Paper and Citations

⚖️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages