-
Notifications
You must be signed in to change notification settings - Fork 61
Description
Hello everyone, thank you for the great job!
I am trying to further fine-tune the LLaVA architecture using your implementation with LLaMA 3 Instruct 8B. I can already fine-tune the Vicuna model using the original LLaVA code and now I am looking for some implementation with LLaMA 3.
I found your repo and followed your instructions from the README.md file for each step. I am able to train the model using the following bash file and it looks like it's correctly saved. NOTE: I downloaded the model from your huggingface repo
TRAINING CODE
#!/bin/bash
################## MODELS #################
PROMPT_VERSION="llama3"
MODEL_DIR_PATH="/user/hf_models/"
MODEL_VERSION="LLaVA-Meta-Llama-3-8B-Instruct-FT"
MODEL_ABS_PATH=$MODEL_DIR_PATH/$MODEL_VERSION
################### END ###################
################## CUDA ####################
export CUDA_VISIBLE_DEVICES=0
echo "CUDA IS" ${CUDA_VISIBLE_DEVICES}
################## CUDA ####################
################# TRAINING #################
deepspeed llava/train/train_mem.py \
--lora_enable True --lora_r 128 --lora_alpha 256\
--deepspeed ./scripts/zero3.json \
--model_name_or_path $MODEL_ABS_PATH \
--version $PROMPT_VERSION \
--data_path ./data/train.json \
--image_folder ./data/images \
--vision_tower openai/clip-vit-large-patch14-336 \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--bf16 True \
--output_dir ./checkpoints/llava-$MODEL_VERSION-lora\
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 32 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 50000 \
--save_total_limit 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.1 \
--lr_scheduler_type "linear" \
--logging_steps 1 \
--tf32 True \
--model_max_length 1024 \
--gradient_checkpointing False \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to none
I then tried to merge (using this script from LLaVA) the resulting adapters with the original model LLaVA-Meta-Llama-3-8B-Instruct-FT and I got the following error.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LLaVA from base model...
/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/user/mm-iglu-it/./scripts/merge_lora_weights.py", line 22, in <module>
merge_lora(args)
File "/user/mm-iglu-it/./scripts/merge_lora_weights.py", line 8, in merge_lora
tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu')
File "/user/mm-iglu-it/llava/model/builder.py", line 64, in load_pretrained_model
model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)
File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3682, in from_pretrained
) = cls._load_pretrained_model(
File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4109, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 887, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([128257, 4096]) in "weight" (which has shape torch.Size([128256, 4096])), this look incorrect.
Finally, I even tried using the adapters (without merging) with the following script but I get the same identical error. The file llava/eval/test_llava.py is very similar to the inference script from the original LLaVA repo, but I made very little changes for my convenience (such as --prompt-version, --input-file-path, etc.).
TESTING CODE
# !/bin/bash
##################################### MODEL #####################################
PROMPT_VERSION="llama3"
MODEL_NAME="llava-LLaVA-Meta-Llama-3-8B-Instruct-FT-lora"
MODEL_BASE="LLaVA-Meta-Llama-3-8B-Instruct-FT"
################################## CHOOSE CUDA ##################################
export CUDA_VISIBLE_DEVICES=0
echo "CUDA is" ${CUDA_VISIBLE_DEVICES}
###################################### END ######################################
#################################### TESTING ####################################
deepspeed ./llava/eval/test_llava.py \
--model-path ./checkpoints/$MODEL_NAME \
--model-base /user/hf_models/$MODEL_BASE \
--model-name $MODEL_NAME \
--prompt-version $PROMPT_VERSION \
--input-file-path ./data/test.json \
--image-path ./data/images
Do you have any idea what I am doing wrong? I can't find anything online.