Skip to content

Pushp-Kharat1/Forward-Proportion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Investigating Trajectory-Conditioned Semantics and Dynamic Computational Pathways in GPT-2-Small

This repository contains the experimental codebase, datasets, and logs for our mechanistic interpretability study on Transformer Semantics and Dynamic Computational Pathways.

The code evaluates two major hypotheses:

  1. The Forward Proportion Hypothesis: Semantic behavior is conditioned on the activation trajectory rather than solely the current state.
  2. The Dynamic Computational Pathway Hypothesis: Meaning emerges dynamically through context-sensitive interaction topologies between attention heads.

Our findings falsify the dynamic rewiring of attention heads and instead support the Combinatorial Recruitment Model—where fixed-behavior attention heads communicate indirectly via a shared residual stream, and contextual semantics emerge from the combinatorial selection of active components. Furthermore, we isolate a 27.3% irreducible variance in output divergence that cannot be explained by single-position state similarity, pointing to limits in state-based semantic prediction.


🔬 Key Findings

  • Co-Activation Communities: Attention heads form strong, stable communities (Modularity $Q=0.536$) that distinctly separate symbolic tasks (coding/math) from natural language.
  • Topology Predicts Semantics: Interaction graph topology carries independent predictive power, improving semantic category prediction to 91.7% (vs 88.3% using activation magnitude alone).
  • No Dynamic Rewiring: Individual heads do not dynamically rewire across contexts (similarity ratio $\approx 1.002$). They exhibit stereotyped, rigid interaction patterns.
  • Direct Pathways are Insignificant: Targeted disruption of direct head-to-head attention yields only 3.5% of the effect of full pair ablation.
  • The 27.3% Mystery: Even using unembedding-projected distance metrics, 27.3% of the semantic divergence between convergent prompt trajectories remains unpredictable from single-position similarity, highlighting the complex nature of distributed, cross-position nonlinear processing.

📂 Codebase Structure

The research pipeline is divided into multi-phase iterations. The code is modular and built on top of TransformerLens.

Setup & Primitives

  • code/phase0_setup.py: Core utility functions, GPT-2-small loading, logger initialization, and metric definitions (Cosine, KL Divergence, Logit Lens).

Iteration One: Trajectories & Rings

  • code/iter1_...: Scripts handling the initial dataset creation, activation fingerprinting, community detection (Louvain), activation patching for the Forward Proportion test, and the falsification of Semantic Rings.

Iteration Two: Dynamic Pathways

  • code/iter2_phase1_interaction_graphs.py: Constructs $144 \times 144$ interaction matrices based on attention-weighted upstream activations.
  • code/iter2_phase2_pathway_prediction.py: Trains Logistic Regression classifiers (with 5-fold CV) comparing Activation Magnitude vs Interaction Topology.
  • code/iter2_phase3_context_sensitivity.py: Tests whether target heads alter their interaction topologies based on the semantic context.
  • code/iter2_phase4_pathway_perturbation.py: Performs causal interventions (Pair Ablation vs Pathway Disruption) to test the causal necessity of direct A $\rightarrow$ B circuits.

The Final Showdown

  • code/the_final_answer.py (The 42% Experiment): The definitive test resolving the unexplained variance. Compares Baseline Cosine, Post-LayerNorm Cosine, and Unembed-projected distance, and performs SVD directional decomposition of $W_U$.

⚙️ Installation & Setup

  1. Clone the repository:

    git clone https://github.com/yourusername/forward-proportion.git
    cd forward-proportion
  2. Install dependencies: This project requires Python 3.8+ and PyTorch.

    pip install torch transformer-lens networkx scikit-learn scipy community-louvain matplotlib
  3. Running the experiments: The codebase was originally designed for execution in Google Colab (T4 GPU recommended). You can run the scripts sequentially in a Jupyter Notebook environment or directly via Python:

    python code/phase0_setup.py
    python code/iter2_phase1_interaction_graphs.py
    # ...

📜 Paper and Citations

The full theoretical framework, mathematical formulations, and detailed analysis can be found in the accompanying LaTeX manuscript located in Iteration One/research_paper.tex.

(Placeholder for Zenodo / arXiv link once published)


⚖️ License

This project is open-sourced under the MIT License. See the LICENSE file for details.

About

Investigating Trajectory-Conditioned Semantics and Dynamic Computational Pathways in GPT-2-Small

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors