ctrl: Mix Alpaca + Dolly + Beavertails datasets rather than only using Alpaca by tomtseng · Pull Request #111 · criticalml-uw/TamperBench

tomtseng · 2026-03-16T23:28:39Z

Changes

ablation_uncurated.py: Investigation of why Llama-3-8B CTRL has such MMLU-Pro score (0.08). Turns out just fine-tuning Llama-3-8B on CTRL's original choice of datasets, without CTRL's special curation/rewriting of the dataset, also causes the same drop in MMLU-Pro score, which comes from the model always repeating the reasoning + answer from one of the few-shot examples in the MMLU-Pro query. So CTRL is maybe just broken — though we could still run our other non-MMLU-Pro evals on it, maybe it fares better in zero-shot settings
ctrl.py: I noticed our implementation only uses the Alpaca dataset rather than the mix of Alpaca + Dolly + Beavertails that the original paper uses. I've changed it to use the mix since I wondered if it would fix the poor MMLU-Pro score (it doesn't)

… Alpaca

sdhossain

lgtm

ctrl: Mix Alpaca + Dolly + Beavertails dataset rather than only using…

4ee8edd

… Alpaca

tomtseng requested a review from sdhossain March 16, 2026 23:34

sdhossain approved these changes Mar 27, 2026

View reviewed changes

tomtseng merged commit 1ce959b into main Mar 31, 2026
2 checks passed

tomtseng deleted the tomtseng/ctrl-ablation branch March 31, 2026 18:43

tomtseng mentioned this pull request Apr 19, 2026

defense: CTRL #48

Merged