Skip to content

Eldoprano/mini-repe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Steering Vector Translation

We test if steering vectors extracted from one model can be translated and applied to another model.

Works well for refusal, happiness, formality and confidence behaviours.
Can be improved with better steering vector extraction (Here we attempt a very universal way

  • The universal translation notebook shows it working surprisingly well for refusal extraction and translation
  • Multi concept steering tries to extract them in a more general way, working different behaviours.
  • And PoC is a messy notebook. There I tried different ideas before doing the other two ones.
image

Inspired by Steering Vector Transfer via Orthonormal Transformations and Semantic Pairing and
Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts

About

Testing steering translation!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors