fix(torchao): update imports of quantizer#549
Conversation
torchao quantizertorchao quantizer
|
@cursor review |
|
Could you please review the changes? cc: @minettekaum |
torchao quantizertorchao quantizer
torchao quantizerThere was a problem hiding this comment.
Hi @ParagEkbote,
Thanks for the updates! Everything looks good to me :D
I made a small change to the PR title so it would pass the check.
Cheers!
|
Is the PR ready, or are there any additional maintainer review needed? |
|
Hi! The PR is ready 😊 We're just doing a few updates on the Pruna repo right now, so merging is temporarily blocked. You should be able to merge it at the beginning of next week. I can tag you in a comment once merging is available again! |
begumcig
left a comment
There was a problem hiding this comment.
Hey Parag, thank you so much for handling this issue! Everything looks almost perfect to me, could you please check this with the oldest version of torch + torchao we currently support in pruna to see if it still works? If not we might need to add a dynamic import check.
|
Hi @ParagEkbote, the release is done. Before you merge, could you please check @begumcig's comment 😊 |
|
I have tested the changes with the following script and it seems to work correctly: import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from pruna import SmashConfig, PrunaModel
MODEL_ID = "HuggingFaceTB/SmolLM2-135M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
base_model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
dtype=torch.bfloat16,
device_map="auto"
)
# ---- CORRECT CONFIG ----
smash_config = SmashConfig.from_list(
["torchao"], # REQUIRED: registers quantizer
batch_size=1,
device="cuda"
)
# Add TorchAO params
smash_config.add({
"torchao_quant_type": "int8wo",
"torchao_excluded_modules": "norm+embedding",
"torchao_target_modules": {
"include": ["model.layers.*"]
}
})
# ---- Wrap model ----
model = PrunaModel(base_model, smash_config=smash_config)
model.set_to_eval()
# ---- Inference ----
prompt = "Explain quantization trade-offs briefly."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.model.generate(
**inputs,
max_new_tokens=75,
temperature=0.2,
top_p=0.9,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Could you please review? cc: @begumcig |
begumcig
left a comment
There was a problem hiding this comment.
Hi @ParagEkbote thank you so much for testing this! The script is super helpful to confirm the PR works in your environment, I am still specifically worried about compatibility with the oldest torch + torchao versions we support in Pruna. Could you please share which versions you tested, and also verify against the minimum supported versions? If the new config-class imports fail there, we’ll probably need a version check or dynamic import fallback. Thank youu!
|
Thanks for the clarification, I did test with different versions as shown in the compatibility table but I did not get a torchao and torch API mismatch. I tested it with the following versions:
Could you please review? |
begumcig
left a comment
There was a problem hiding this comment.
You dropped this 👑👑👑 @ParagEkbote Thank you so much for handling this! Let's merge it
Description
After the release of torchao 0.15.0, the config function which are used in the
torchaoquantizer have been deprecated and removed, instead being updated with a new class config. We can also view this in the release notes. I have updated the imports to reflect the changes. Could you please review?cc: @minettekaum
Related Issue
Fixes #(issue number)
Type of Change
How Has This Been Tested?
Checklist
Additional Notes