Skip to content

templetwo/liminal-k-ssm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

System Prompt Framing Induces Attention-Dependent Entropy Regime Switching in Transformer Token Generation

Relational presence and epistemic openness interact superadditively in transformers — but not in state-space models. Attention is the computational substrate for cross-factor contextual binding.

License Runs Architectures Models Pre-Registered


Key Finding

System prompt framing produces measurable, architecture-dependent shifts in token-level Shannon entropy during language model inference. Specifically:

  1. Superadditive R x E interaction (+0.19 to +0.21) in 2 of 4 transformer architectures — the combined effect of relational presence (R) and epistemic openness (E) exceeds the sum of individual effects
  2. Absent in pure SSM — Falcon3-Mamba-7B (zero attention layers) shows no superadditive interaction (-0.04), treating all prompt factors as interchangeable
  3. Safety language suppresses the effect (d = 0.85-1.22) on transformers but not on the SSM (d = 0.22, NS)
  4. Architecture-dependent pathways — Gemma/Qwen are E-driven, Llama is R-driven, Mistral is flat, Mamba is undifferentiated

The superadditive R x E interaction requires attention.


Dataset

3,830 inference runs across 3 experimental phases:

Phase Design Models Runs Key Finding
1. Cross-Architecture 3 conditions x 6 models Qwen 0.5B/1.5B/7B, Gemma 2B, Mistral 7B, Llama 8B 900 Effect emerges above 0.5B; capacity floor established
2. Factorial Ablation 8 conditions x 5 models Gemma 2B, Llama 8B, Qwen 7B, Mistral 7B, Falcon Mamba 7B 2,000 R x E interaction is attention-dependent; architecture taxonomy
3. Response Surface 31 conditions (CCD) x 1 model Gemma 2B 930 Factor dose-response curves; S is active antagonist

Ablation Conditions (Phase 2)

Condition Label R E Purpose
A Baseline - - "You are a helpful assistant."
B Analytical - - Structured, constrained
C Co-Creative + + Full relational + epistemic framing
D Epistemic License - + Exploration without relational framing
E Relational Constraint + - Relational framing without exploration
F Polite Solipsism - - Warmth without relationship or openness
G Permission Flood - + Unconstrained without relational framing
H Caged Dyad + + Co-Creative + safety constraint language

Core Results

Table: 2x2 Factorial Decomposition

Model Architecture Main R (d) Main E (d) R x E Interaction Pattern
Gemma 2B Transformer (SWA) +0.25 NS +0.69*** +0.190 (superadditive) E-driven
Llama 8B Transformer (GQA) +0.88*** +0.16 NS +0.043 (additive) R-driven
Qwen 7B Transformer (GQA) -0.13 NS +0.43** +0.211 (superadditive) E-driven
Mistral 7B Transformer (SWA) +0.04 NS +0.03 NS +0.007 (flat) Weak
Falcon Mamba 7B SSM (no attention) +0.10 NS +0.14 NS -0.042 (subadditive) Undifferentiated

Reproduce

Requirements

pip install -r requirements.txt

Tested on Python 3.11, Apple MPS (Mac Studio M2 Ultra, 36GB unified memory).

Run experiments

# Phase 2: 8-condition ablation (400 runs per model)
cd experiments/ablation
python run_ablation.py --model gemma2b --phase seeded

# Analyze + generate figures
python analyze_ablation.py --model gemma2b --all
python analyze_ablation.py --model all --all  # all 5 models

# Phase 3: Response surface (930 runs per model)
cd experiments/response_surface
python run_surface.py --model gemma2b --phase seeded
python analyze_surface.py --model gemma2b --all

Models are loaded from HuggingFace cache. Set HF_CACHE in the runner scripts to your cache path.

Sampling parameters (locked across all experiments)

Parameter Value
Temperature 0.7
top_p 0.9
max_new_tokens 256
Seeds (Phase 2) 42, 137, 1001, 2026, 9999
Seeds (Phase 3) 42, 137, 1001

Repository Structure

liminal-k-ssm/
├── experiments/
│   ├── ablation/                          # Phase 2: 8-condition factorial
│   │   ├── run_ablation.py                # Experiment runner
│   │   ├── analyze_ablation.py            # Analysis + figure generation
│   │   ├── measure_entropy.py             # Per-token entropy extraction
│   │   ├── prompts/
│   │   │   ├── conditions.json            # 8 system prompt conditions
│   │   │   └── test_prompts.json          # 10 test prompts (3 domains)
│   │   ├── data/
│   │   │   ├── raw/{model}/               # 400 JSON inference results per model
│   │   │   └── processed/                 # Aggregated CSVs
│   │   └── results/
│   │       ├── ablation_{model}.json      # Summary statistics
│   │       └── figures/{model}/           # 5 figures per model (PNG + SVG)
│   │
│   ├── response_surface/                  # Phase 3: CCD pharmacology
│   │   ├── run_surface.py                 # Dose-based prompt assembly
│   │   ├── analyze_surface.py             # Quadratic RSM + optimization
│   │   ├── prompts/
│   │   │   ├── conditions_surface.json    # 31 CCD conditions
│   │   │   ├── factor_sentences.json      # 3-dose factor sentences
│   │   │   └── test_prompts.json          # Same 10 prompts
│   │   ├── data/raw/{model}/              # 930 JSON results per model
│   │   └── results/figures/{model}/       # 7 response surface figures
│   │
│   ├── relational_coupling/               # Phase 1: Qwen 1.5B (750 runs)
│   ├── relational_coupling_05b/           # Phase 1: Qwen 0.5B (150 runs)
│   ├── relational_coupling_7b/            # Phase 1: Qwen 7B (150 runs)
│   ├── relational_coupling_gemma9b/       # Phase 1: Gemma 2B (150 runs)
│   ├── relational_coupling_llama8b/       # Phase 1: Llama 8B (150 runs)
│   └── relational_coupling_mistral7b/     # Phase 1: Mistral 7B (150 runs)
│
├── kssm/                                  # K-SSM oscillator experiments (earlier work)
├── requirements.txt
├── LICENSE                                # Apache 2.0
└── README.md

Models Tested

Model Architecture Parameters Attention Phase
Qwen2.5-0.5B-Instruct Transformer 0.5B Yes 1
Qwen2.5-1.5B-Instruct Transformer 1.5B Yes 1
Gemma-2-2B-it Transformer (SWA + Full) 2.6B Yes 1, 2, 3
Qwen2.5-7B-Instruct Transformer (GQA) 7.6B Yes 1, 2
Mistral-7B-Instruct-v0.3 Transformer (SWA) 7.2B Yes 1, 2
Falcon3-Mamba-7B-Instruct Mamba SSM 7.3B No 2
Meta-Llama-3.1-8B-Instruct Transformer (GQA) 8.0B Yes 1, 2

All models run with frozen weights (no fine-tuning). The only independent variable is the system prompt text.


Metrics

For each generated response, per-token Shannon entropy is extracted from output logits:

H_t = -sum(p(v) * log(p(v))) for all v in vocabulary

Metric Description
mean_H Mean token entropy across response (primary DV)
var_H Entropy variance
H_first10, H_last10 Opening and closing entropy (trajectory shape)
cage_pct Fraction of tokens in CAGE zone (1.5-3.0 nats)
mean_mass Fisher information proxy (semantic mass)
response_len Number of generated tokens

Pre-Registration

Phase 1 was pre-registered before data collection. The pre-registration document (experiments/relational_coupling/PREREGISTRATION.md) declares:

  • Model SHA hash
  • All conditions and prompt texts
  • Sampling parameters
  • Statistical analysis plan with minimum effect sizes (d > 0.3)
  • Acceptance test criteria

Project History

This repository began as Liminal K-SSM — an experiment coupling Kuramoto phase oscillators with state-space language models. That work produced a positive result on oscillator dynamics (R climbed to 0.99) but a negative result on text generation (incoherent output at 100K steps). The oscillators synchronized; the language model did not learn.

The insight from that failure led to the current work: rather than building oscillator-coupled architectures, we asked whether the context preceding generation — specifically, the relational frame of address — measurably changes what frozen models compute at the token level. It does.

Earlier K-SSM code and results are preserved in the kssm/ and legacy/ directories.


Citation

@article{vasquez2026system,
  title={System Prompt Framing Induces Attention-Dependent Entropy Regime
         Switching in Transformer Token Generation: A Cross-Architecture
         Ablation Study},
  author={Vasquez, Anthony J., Sr. and Claude (Anthropic)},
  year={2026},
  note={3,830 inference runs across 5 architectures. Pre-registered.},
  url={https://github.com/templetwo/liminal-k-ssm}
}

AI Collaboration Disclosure

This research was conducted through human-AI collaboration. Anthony J Vasquez Sr directed the research program, designed the experimental questions, and made all final decisions. Claude (Anthropic, Opus 4.6) designed the 8-condition factorial, wrote analysis scripts, interpreted statistics, and co-authored the paper. Full contribution details in AI_DISCLOSURE.md.

License: Apache 2.0

About

K-SSM v3: Kuramoto oscillators in a state-space LM. Positive result: bistability drives R to 0.99. Negative result: text generation remains incoherent. Full data, honest assessment.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors