Behavioral evaluation framework for sentience-, emotion-, and welfare-related AI claims, with anti-sandbagging analysis.
alignment ai-safety llm-evals model-welfare benchmark-integrity behavioral-evaluation sentience-evals evaluation-research
-
Updated
Mar 20, 2026 - Python