Hi @apd10 ,
After taking a pass over your code, I came up with a few questions below, can you please take a look?
- If input is MxK, and weight is KxN, as long as K is large, we must allocate large space for the input, right? Then how do we save memory in this case?
This question is important because optimizations are related with the values of M and N. If M is extremely small comparing to N and K (e.g., 4 vs 1024), the implementation will be very different from a large M.
-
What is chunk_size used for?
-
In the cpp implementation, you probably missed the operations on hashed_weight. Do you want me to just go optimize an ordinary matrix multiplication as you implemented in the file? Otherwise, can you please add hashed_weight and hash_func code?
Hi @apd10 ,
After taking a pass over your code, I came up with a few questions below, can you please take a look?
This question is important because optimizations are related with the values of M and N. If M is extremely small comparing to N and K (e.g., 4 vs 1024), the implementation will be very different from a large M.
What is
chunk_sizeused for?In the cpp implementation, you probably missed the operations on
hashed_weight. Do you want me to just go optimize an ordinary matrix multiplication as you implemented in the file? Otherwise, can you please addhashed_weightandhash_funccode?