Uncovering Competency Gaps in Large Language Models and Their Benchmarks

Uncovering Competency Gaps in Large Language Models and Their Benchmarks

Maty Bohacek, Nino Scherrer, Nicholas Dufour, Thomas Leung, Christoph Bregler, Stephanie C.Y. Chan

This repository contains the official implementation of Competency Gaps, a representation-grounded evaluation method that uses sparse autoencoders (SAEs) to automatically surface both model gaps and benchmark gaps. The approach extracts SAE concepts and computes saliency-weighted performance scores to reveal why models succeed or fail and which concepts benchmarks over- or under-represent. Applied to multiple open-source LLMs and benchmarks, the method recovers known weaknesses without supervision.

Website — Paper — Contact us

Pre-print. Under review.

Getting Started

Code coming soon.

Citation

If you find our work useful, please consider citing our paper.

@inproceedings{tbd2026competencygaps,
  title={Uncovering Competency Gaps in Large Language Models and Their Benchmarks},
  author={TBD},
  booktitle={TBD},
  year={TBD}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uncovering Competency Gaps in Large Language Models and Their Benchmarks

Maty Bohacek, Nino Scherrer, Nicholas Dufour, Thomas Leung, Christoph Bregler, Stephanie C.Y. Chan

Getting Started

Citation

About

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Uncovering Competency Gaps in Large Language Models and Their Benchmarks

Maty Bohacek, Nino Scherrer, Nicholas Dufour, Thomas Leung, Christoph Bregler, Stephanie C.Y. Chan

Getting Started

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!