Skip to content

KeyError: ModelReference Problem #653

@hengran

Description

@hengran

Hi,
Thanks for your comprehensive toolkit.
I was running the ties merging methods, but there is a problem:
Errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
WARNING:mergekit.architecture:No JSON architecture found for Qwen3Model
WARNING:root:Inferred 3 modules:
WARNING:root: 'layers' with 28 layers, 11 templates, and 0 loose weights
WARNING:root: 'model.layers' with 28 layers, 11 templates, and 0 loose weights
WARNING:root: 'default' with 0 layers, 0 templates, and 5 loose weights
torch_dtype is deprecated! Use dtype instead!
Executing graph: 25%|██████████████████████████████▏ | 936/3728 [00:00<00:00, 4223.25it/s]
Traceback (most recent call last):
File "/usr/local/bin/mergekit-yaml", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1442, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mergekit/options.py", line 166, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mergekit/scripts/run_yaml.py", line 30, in main
run_merge(
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge.py", line 85, in run_merge
for _task, value in exec.run(quiet=options.quiet):
File "/usr/local/lib/python3.10/dist-packages/mergekit/graph.py", line 518, in run
for handle, value in self._run(quiet=quiet, desc=desc):
File "/usr/local/lib/python3.10/dist-packages/mergekit/graph.py", line 484, in _run
res = task.execute(**arguments)
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge_methods/generalized_task_arithmetic.py", line 125, in execute
tvs, base = get_task_vectors(
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge_methods/generalized_task_arithmetic.py", line 197, in get_task_vectors
base = tensors[base_model]
KeyError: ModelReference(model=ModelPath(path='/mnt/data/models/Qwen/Qwen3-0.6B', revision=None), lora=None, override_architecture=None)

The yml file is:
models:

  • model: Qwen3-0.6B_retrieval
    parameters:
    weight: 1
  • model: Qwen3-0.6B_sts
    parameters:
    weight: 1

merge_method: ties
base_model: /mnt/data/models/Qwen/Qwen3-0.6B
parameters:
normalize: true
int8_mask: true
dtype: float32

I use the latest version of MergeKit. The base model is downloaded in the /mnt/data/models/Qwen/Qwen3-0.6B. Qwen3-0.6B_retrieval and Qwen3-0.6B_sts are the models that are full-parameter fine-tuning trained from Qwen3-0.6 model in different datasets.
The multislerp merging method works fine, but the TIES merging method encounters the problem described above.
It also has another interesting finding that Qwen3-4B works fine with TIES merging, and Qwen3-0.6B with TIES merging has the problem described above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions