forked from ggml-org/llama.cpp
-
-
Notifications
You must be signed in to change notification settings - Fork 174
Feature Request: #74
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Prerequisites
Feature Description
there is a truboquant implementation that has zero performance hit on tubo4: https://github.com/test1111111111111112/llama-cpp-turboquant-gemma4 have u seen that ? hope the faster one gets merged to master for turbo4. i sell testet on a RTX 4080 Laptop gpt and with that version i get 80 t/s while with this version here i get 65 t/s only.
Motivation
having zero performance hit on turbo4
Possible Implementation
No response