Hi,
Thanks for your comprehensive toolkit.
I was running the ties merging methods, but there is a problem:
Errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
WARNING:mergekit.architecture:No JSON architecture found for Qwen3Model
WARNING:root:Inferred 3 modules:
WARNING:root: 'layers' with 28 layers, 11 templates, and 0 loose weights
WARNING:root: 'model.layers' with 28 layers, 11 templates, and 0 loose weights
WARNING:root: 'default' with 0 layers, 0 templates, and 5 loose weights
torch_dtype is deprecated! Use dtype instead!
Executing graph: 25%|██████████████████████████████▏ | 936/3728 [00:00<00:00, 4223.25it/s]
Traceback (most recent call last):
File "/usr/local/bin/mergekit-yaml", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1442, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mergekit/options.py", line 166, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mergekit/scripts/run_yaml.py", line 30, in main
run_merge(
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge.py", line 85, in run_merge
for _task, value in exec.run(quiet=options.quiet):
File "/usr/local/lib/python3.10/dist-packages/mergekit/graph.py", line 518, in run
for handle, value in self._run(quiet=quiet, desc=desc):
File "/usr/local/lib/python3.10/dist-packages/mergekit/graph.py", line 484, in _run
res = task.execute(**arguments)
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge_methods/generalized_task_arithmetic.py", line 125, in execute
tvs, base = get_task_vectors(
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge_methods/generalized_task_arithmetic.py", line 197, in get_task_vectors
base = tensors[base_model]
KeyError: ModelReference(model=ModelPath(path='/mnt/data/models/Qwen/Qwen3-0.6B', revision=None), lora=None, override_architecture=None)
The yml file is:
models:
- model: Qwen3-0.6B_retrieval
parameters:
weight: 1
- model: Qwen3-0.6B_sts
parameters:
weight: 1
merge_method: ties
base_model: /mnt/data/models/Qwen/Qwen3-0.6B
parameters:
normalize: true
int8_mask: true
dtype: float32
I use the latest version of MergeKit. The base model is downloaded in the /mnt/data/models/Qwen/Qwen3-0.6B. Qwen3-0.6B_retrieval and Qwen3-0.6B_sts are the models that are full-parameter fine-tuning trained from Qwen3-0.6 model in different datasets.
The multislerp merging method works fine, but the TIES merging method encounters the problem described above.
It also has another interesting finding that Qwen3-4B works fine with TIES merging, and Qwen3-0.6B with TIES merging has the problem described above.
Hi,
Thanks for your comprehensive toolkit.
I was running the ties merging methods, but there is a problem:
Errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0.WARNING:mergekit.architecture:No JSON architecture found for Qwen3Model
WARNING:root:Inferred 3 modules:
WARNING:root: 'layers' with 28 layers, 11 templates, and 0 loose weights
WARNING:root: 'model.layers' with 28 layers, 11 templates, and 0 loose weights
WARNING:root: 'default' with 0 layers, 0 templates, and 5 loose weights
torch_dtypeis deprecated! Usedtypeinstead!Executing graph: 25%|██████████████████████████████▏ | 936/3728 [00:00<00:00, 4223.25it/s]
Traceback (most recent call last):
File "/usr/local/bin/mergekit-yaml", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1442, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mergekit/options.py", line 166, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mergekit/scripts/run_yaml.py", line 30, in main
run_merge(
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge.py", line 85, in run_merge
for _task, value in exec.run(quiet=options.quiet):
File "/usr/local/lib/python3.10/dist-packages/mergekit/graph.py", line 518, in run
for handle, value in self._run(quiet=quiet, desc=desc):
File "/usr/local/lib/python3.10/dist-packages/mergekit/graph.py", line 484, in _run
res = task.execute(**arguments)
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge_methods/generalized_task_arithmetic.py", line 125, in execute
tvs, base = get_task_vectors(
File "/usr/local/lib/python3.10/dist-packages/mergekit/merge_methods/generalized_task_arithmetic.py", line 197, in get_task_vectors
base = tensors[base_model]
KeyError: ModelReference(model=ModelPath(path='/mnt/data/models/Qwen/Qwen3-0.6B', revision=None), lora=None, override_architecture=None)
The yml file is:
models:
parameters:
weight: 1
parameters:
weight: 1
merge_method: ties
base_model: /mnt/data/models/Qwen/Qwen3-0.6B
parameters:
normalize: true
int8_mask: true
dtype: float32
I use the latest version of MergeKit. The base model is downloaded in the /mnt/data/models/Qwen/Qwen3-0.6B. Qwen3-0.6B_retrieval and Qwen3-0.6B_sts are the models that are full-parameter fine-tuning trained from Qwen3-0.6 model in different datasets.
The multislerp merging method works fine, but the TIES merging method encounters the problem described above.
It also has another interesting finding that Qwen3-4B works fine with TIES merging, and Qwen3-0.6B with TIES merging has the problem described above.