-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hello, I greatly appreciate your work and am currently trying to use your quantitative model framework for qwen3-8b, but the model response I received is completely incorrect. Can you please tell me how to debug it?
I used scripts/run_matgptq.sh to quantize and save the qwen3-8b model, then save the corresponding model weights, and pass the quantized weights to inference_ib/scripts/run_inreference_transformers.sh to run with the Kernel Quantized mode.(whatever i use mode1 or mode 2)
However, the final outputs are all '!' .
Input: Please introduce Large Language Model!
generated: Please introduce Large Language Model!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
To prevent any issues with my quantification process, I also downloaded the model file you released directly, with the link being: https://huggingface.co/ISTA-DASLab/Qwen3-8B-MatGPTQ , then converted to the format of data-pt and used inference_ib/scripts/run_inreference_transformers.sh, but still encountered problems.
the convert (safetensors to data.pt) code:
index = json.load(open(os.path.join(model_dir, "model.safetensors.index.json")))
for shard in sorted(set(index["weight_map"].values())):
with safe_open(os.path.join(model_dir, shard), framework="pt", device="cpu") as f:
keys = set(f.keys())
for k in keys:
if not k.endswith(".qweight"):
continue
layer = k[:-8]
sk = f"{layer}.scales"
if sk not in keys:
continue
layer_dir = os.path.join(out_dir, layer)
os.makedirs(layer_dir, exist_ok=True)
torch.save(
{"qweight": f.get_tensor(k), "scale": f.get_tensor(sk)},
os.path.join(layer_dir, "data.pt"),
)
May I ask where I might have encountered a problem?Thank you very much !