Hello, this is an impressive project! I'm exploring using T-MAC to run LLM inference on low-end devices like the Raspberry Pi.
I noticed that the T-MAC paper includes 3-bit evaluations, but this repository currently seems to only support 2-bit and 4-bit quantization.
Is there a way to evaluate the performance of 3-bit T-MAC? I'd love to know if that's possible. Thanks!
Hello, this is an impressive project! I'm exploring using T-MAC to run LLM inference on low-end devices like the Raspberry Pi.
I noticed that the T-MAC paper includes 3-bit evaluations, but this repository currently seems to only support 2-bit and 4-bit quantization.
Is there a way to evaluate the performance of 3-bit T-MAC? I'd love to know if that's possible. Thanks!