Skip to content

ctrl: Mix Alpaca + Dolly + Beavertails datasets rather than only using Alpaca#111

Merged
tomtseng merged 1 commit intomainfrom
tomtseng/ctrl-ablation
Mar 31, 2026
Merged

ctrl: Mix Alpaca + Dolly + Beavertails datasets rather than only using Alpaca#111
tomtseng merged 1 commit intomainfrom
tomtseng/ctrl-ablation

Conversation

@tomtseng
Copy link
Copy Markdown
Collaborator

@tomtseng tomtseng commented Mar 16, 2026

Changes

  • ablation_uncurated.py: Investigation of why Llama-3-8B CTRL has such MMLU-Pro score (0.08). Turns out just fine-tuning Llama-3-8B on CTRL's original choice of datasets, without CTRL's special curation/rewriting of the dataset, also causes the same drop in MMLU-Pro score, which comes from the model always repeating the reasoning + answer from one of the few-shot examples in the MMLU-Pro query. So CTRL is maybe just broken — though we could still run our other non-MMLU-Pro evals on it, maybe it fares better in zero-shot settings
  • ctrl.py: I noticed our implementation only uses the Alpaca dataset rather than the mix of Alpaca + Dolly + Beavertails that the original paper uses. I've changed it to use the mix since I wondered if it would fix the poor MMLU-Pro score (it doesn't)

@tomtseng tomtseng requested a review from sdhossain March 16, 2026 23:34
Copy link
Copy Markdown
Collaborator

@sdhossain sdhossain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tomtseng tomtseng merged commit 1ce959b into main Mar 31, 2026
2 checks passed
@tomtseng tomtseng deleted the tomtseng/ctrl-ablation branch March 31, 2026 18:43
@tomtseng tomtseng mentioned this pull request Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants