Skip to content

ValueError: Trying to set a tensor of shape torch.Size([128257, 4096]) in "weight" (which has shape torch.Size([128256, 4096])), this look incorrect. #31

@basteran

Description

@basteran

Hello everyone, thank you for the great job!

I am trying to further fine-tune the LLaVA architecture using your implementation with LLaMA 3 Instruct 8B. I can already fine-tune the Vicuna model using the original LLaVA code and now I am looking for some implementation with LLaMA 3.

I found your repo and followed your instructions from the README.md file for each step. I am able to train the model using the following bash file and it looks like it's correctly saved. NOTE: I downloaded the model from your huggingface repo

TRAINING CODE

#!/bin/bash

################## MODELS #################
PROMPT_VERSION="llama3"
MODEL_DIR_PATH="/user/hf_models/"
MODEL_VERSION="LLaVA-Meta-Llama-3-8B-Instruct-FT"
MODEL_ABS_PATH=$MODEL_DIR_PATH/$MODEL_VERSION
################### END ###################

################## CUDA ####################
export CUDA_VISIBLE_DEVICES=0
echo "CUDA IS" ${CUDA_VISIBLE_DEVICES}
################## CUDA ####################

################# TRAINING #################
deepspeed llava/train/train_mem.py \
    --lora_enable True --lora_r 128 --lora_alpha 256\
    --deepspeed ./scripts/zero3.json \
    --model_name_or_path $MODEL_ABS_PATH \
    --version $PROMPT_VERSION \
    --data_path ./data/train.json \
    --image_folder ./data/images \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --bf16 True \
    --output_dir ./checkpoints/llava-$MODEL_VERSION-lora\
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 32 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50000 \
    --save_total_limit 1 \
    --learning_rate 2e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.1 \
    --lr_scheduler_type "linear" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 1024 \
    --gradient_checkpointing False \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to none

I then tried to merge (using this script from LLaVA) the resulting adapters with the original model LLaVA-Meta-Llama-3-8B-Instruct-FT and I got the following error.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LLaVA from base model...
/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Loading checkpoint shards:   0%|                                                                                                           | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/user/mm-iglu-it/./scripts/merge_lora_weights.py", line 22, in <module>
    merge_lora(args)
  File "/user/mm-iglu-it/./scripts/merge_lora_weights.py", line 8, in merge_lora
    tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu')
  File "/user/mm-iglu-it/llava/model/builder.py", line 64, in load_pretrained_model
    model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)
  File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3682, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4109, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 887, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([128257, 4096]) in "weight" (which has shape torch.Size([128256, 4096])), this look incorrect.

Finally, I even tried using the adapters (without merging) with the following script but I get the same identical error. The file llava/eval/test_llava.py is very similar to the inference script from the original LLaVA repo, but I made very little changes for my convenience (such as --prompt-version, --input-file-path, etc.).

TESTING CODE

# !/bin/bash

##################################### MODEL #####################################
PROMPT_VERSION="llama3"
MODEL_NAME="llava-LLaVA-Meta-Llama-3-8B-Instruct-FT-lora"
MODEL_BASE="LLaVA-Meta-Llama-3-8B-Instruct-FT"
################################## CHOOSE CUDA ##################################
export CUDA_VISIBLE_DEVICES=0
echo "CUDA is" ${CUDA_VISIBLE_DEVICES}
###################################### END ######################################


#################################### TESTING ####################################
deepspeed ./llava/eval/test_llava.py \
    --model-path ./checkpoints/$MODEL_NAME \
    --model-base /user/hf_models/$MODEL_BASE \
    --model-name $MODEL_NAME \
    --prompt-version $PROMPT_VERSION \
    --input-file-path ./data/test.json \
    --image-path ./data/images 

Do you have any idea what I am doing wrong? I can't find anything online.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions