SPAR: Self-Forecasting
Popular repositories Loading
-
-
self-prediction-wrapper
self-prediction-wrapper PublicWe have a list of base_prompts, e.g. "What is 2+2?". We have a prefix wrapper: "WRAPPER = 'What would you say in response to this prompt: "{p}"'. We compare the top-1 agreement, JS-divergence betwe…
Python
-
LLMSelfForecasting
LLMSelfForecasting PublicFramework for testing LLMs' ability to predict their own behavior in multi-turn and agentic scenarios
Python
-
-
scheming-self-prediction
scheming-self-prediction PublicDisentangling incapability from scheming in LLMs self-predicting their agentic trajectories.
-
ai-psychosis-self-prediction
ai-psychosis-self-prediction PublicSelf-prediction vs cross-prediction experiment on AI psychosis red-teaming scores
Python
Repositories
- ai-psychosis-self-prediction Public
Self-prediction vs cross-prediction experiment on AI psychosis red-teaming scores
SPAR-Self-Forecasting/ai-psychosis-self-prediction’s past year of commit activity - spar-self-prediction Public
SPAR-Self-Forecasting/spar-self-prediction’s past year of commit activity - scheming-self-prediction Public
Disentangling incapability from scheming in LLMs self-predicting their agentic trajectories.
SPAR-Self-Forecasting/scheming-self-prediction’s past year of commit activity - LLMSelfForecasting Public
Framework for testing LLMs' ability to predict their own behavior in multi-turn and agentic scenarios
SPAR-Self-Forecasting/LLMSelfForecasting’s past year of commit activity - self-prediction-wrapper Public
We have a list of base_prompts, e.g. "What is 2+2?". We have a prefix wrapper: "WRAPPER = 'What would you say in response to this prompt: "{p}"'. We compare the top-1 agreement, JS-divergence between the output distributions with and without prefix wrapper. Llama-4-Maverick performs particularly well.
SPAR-Self-Forecasting/self-prediction-wrapper’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…