linear-probes

Here are 2 public repositories matching this topic...

levashi / reprobe

Phase-aware LLM activation steering and linear probing. A memory-efficient, practical implementation of Representation Engineering (RepE) for safety research.

transformers pytorch ai-safety mechanistic-interpretability llm-safety representation-engineering activation-steering linear-probes

Updated Mar 26, 2026
Python

tesims / multiagent-emergent-deception

Star

A research tool for studying how deception emerges in multi-agent LLM systems and detecting it through activation analysis.

alignment gemma sparse-autoencoders multi-agent-systems ai-safety emergent-behavior interpretability deception-detection activation-analysis mechanistic-interpretability llm-agents gemma-2b gemma-scope transformer-lens linear-probes

Updated Jan 11, 2026
Python

Improve this page

Add a description, image, and links to the linear-probes topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the linear-probes topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly