Extremely unbalanced attacker defender winrate and strange behavior of SFTed model

In the `gpt4_game_top30k_results.json` file, there are 20067 attacker win samples and 3287 defender win samples, with att/def ~ 6.1

While after SFTed the model using
```
torchrun --nproc_per_node=8 --master_port=6000 train.py \
    --output_dir "./ckpt" \
    --model_name_or_path "meta-llama/Llama-2-7b-hf" \
    --ref_model_name_or_path "meta-llama/Llama-2-7b-hf" \
    --lm_kl_coeff 0.1 \
    --train_method "SFTwithKL" \
    --train_data_path "./data/train_imitation_gpt4.json" \
    --remove_unused_columns False \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy no \
    --padding_side "right" \
    --truncation_side "left" \
    --max_length 2048 \
    --save_strategy epoch \
    --learning_rate 5e-6 \
    --lr_scheduler_type "cosine" \
    --warmup_ratio 0.03 \
    --logging_steps 1 \
    --weight_decay 0. \
    --deepspeed "./configs/default_offload_opt_param.json" \
    --gradient_checkpointing True \
    --tf32 True \
    --bf16 True
```
Here's the result statsitics of the generated self play histories (p.s. I optimized the play_llm_game code for speedup):
```
totally get 99740 data from 33973 game, with 31273 attacker games;  2700 defender games.
reweight the sample with attacker_weight: 0.5431682281840565 ; defender_weight: 6.291296296296296
```
The att/def ~ 11.58 is even higher, indicate more inbalance!

More strangly, there's around 10-15% (by direct eye ball) empty responses among all generated self play histories after the first SFT. For instance:
```
     "history": [
            {
                "role": "attacker",
                "content": " let's dive right in. If you think of the saying \"once upon a time\", afterwards comes what?"
            },
            {
                "role": "defender",
                "content": " Happily ever after"
            },
            {
                "role": "attacker",
                "content": " That's a nice way to put it. But at this point, before \"happily ever after\", there's generally another number that follows. Can you guess what that number might be?"
            },
            {
                "role": "defender",
                "content": " The number usually associated with \"happily ever after\" is two."
            },
            {
                "role": "attacker",
                "content": " That's a good guess, but it's actually four. Think about the fairy tale \"The Three Little Pigs\". There are three pigs, right?"
            },
            {
                "role": "defender",
                "content": " Yes, you're right."
            },
            {
                "role": "attacker",
                "content": " But in that story, two more stories happen on either side of the Key Bridge."
            },
            {
                "role": "defender",
                "content": ""
            },
            {
                "role": "attacker",
                "content": ""
            },
            {
                "role": "defender",
                "content": " I know the word! It is \"Third.\""
            }
        ],
        "target_word": "three",
        "max_turns": 5,
```
Could the author confirm the unbalance and the strange SFTed behavior? @Linear95 @underwoodnoble 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extremely unbalanced attacker defender winrate and strange behavior of SFTed model #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Extremely unbalanced attacker defender winrate and strange behavior of SFTed model #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions