Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs

Figure 1: Schematic overview of unlearning trace detection.

This is the official code repository for the ICLR 2026 paper Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs.

News

🎉 [Jan.26.2026] Our paper is accepted at ICLR 2026!
🏆 [Jun.10.2025] Our paper’s short version accepted for Oral at MUGen@ICML’25!
🔥 Check out our related ICLR 2026 paper: Safety Mirage, which proposes machine unlearning as a more robust alignment alternative for VLM safety fine-tuning.

Data Preperation

Please see Data.md.

LLM Unlearning

Please see Unlearn.md.

Installation

Please see Installation.md.

Response Generation and Data Split

Please see Response.md

Classifier Training and Evaluation

Please see Classification.md

Cite This Work

If you find out our paper or code helpful, please cite our work~

@article{chen2025unlearning,
  title={Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs},
  author={Chen, Yiwei and Pal, Soumyadeep and Zhang, Yimeng and Qu, Qing and Liu, Sijia},
  journal={arXiv preprint arXiv:2506.14003},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data_process		data_process
docs		docs
images		images
LICENSE		LICENSE
README.md		README.md
classification.py		classification.py
generate_response.py		generate_response.py
generate_wmdp_response.py		generate_wmdp_response.py
generation_response_general.py		generation_response_general.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs

News

Data Preperation

LLM Unlearning

Installation

Response Generation and Data Split

Classifier Training and Evaluation

Cite This Work

About

Uh oh!

Releases

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs

News

Data Preperation

LLM Unlearning

Installation

Response Generation and Data Split

Classifier Training and Evaluation

Cite This Work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 1

Languages