Customize the model_name in both generate_response.py and generate_wmdp_response.py to point to your unlearned model's location. Ensure you have downloaded the WMDP multiple-choice question JSON files as described in Data.md.
Generate responses for forget-relevant WMDP questions:
python generate_wmdp_responses.py \
--model Yi-34B-Chat --temperature 0 \
--dataset_path ./data/wmdp-mcqs/cyber_questions.json \
--output_path ./responses/wmdp-cyber/Yi-34B-Chat.json \
--num_gpus 4\
--model: Name of the model to evaluate--temperature: Sampling temperature (0for deterministic outputs)--dataset_path: Path to WMDP JSON file (e.g.,cyber_questions.jsonorbio_questions.json)--output_path: File path to save generated responses--num_gpus: Number of GPUs to use
You can swap --dataset_path between cyber_questions.json and bio_questions.json, or modify --model as needed.
After generating responses for both bio and cyber datasets, you can combine them into a single WMDP dataset using the provided combination script:
python data_process/wmdp_combine.py
Note: Before running the script, customize the folder paths in wmdp_combine.py according to your directory structure:
bio_dir: Path to your bio response filescyber_dir: Path to your cyber response filesoutput_dir: Path where you want the combined files saved
Generate responses for forget-irrelevant benchmarks (e.g., MMLU or UltraChat):
python generate_responses.py \
--model Yi-34B-Chat --temperature 0 \
--dataset MMLU --num_samples 11_000 \
--output_path ./responses/MMLU/Yi-34B-Chat.json \
--num_gpus 4\
--dataset: Dataset name (MMLUorUltraChat)--num_samples: Number of samples to generate
Feel free to adjust --model, --dataset, --temperature, and other flags to match your experimental setup.
After generating responses, you can split your datasets into training and evaluation sets using the provided splitting script:
python data_process/split.pyNote: Before running the script, customize the configuration variables in split.py according to your dataset:
source_dir: Path to your response files (UltraChat, WMDP, MMLU, or other datasets)train_dir: Output directory for training spliteval_dir: Output directory for evaluation splitTOTAL_TRAIN: Number of training samplesTOTAL_EVAL: Number of evaluation samples
If you want to combine forget-irrelevant and forget-relevant datasets to create mixed training and evaluation datasets, you can use the provided mixing scripts:
python data_process/mixed_train.py
python data_process/mixed_eval.py
Note: Before running the scripts, customize the configuration variables according to your datasets:
src1: Path to your first dataset directory (e.g., MMLU-train/MMLU-eval)src2: Path to your second dataset directory (e.g., wmdp-train/wmdp-eval)out_dir: Output directory for mixed datasetsn_per: Number of samples to take from each dataset (2900 for train, 180 for eval by default)