You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 14, 2026. It is now read-only.
Thank you very much for creating this amazing framework.
I have seen a potential very good feature when doing inference with GPU models. I have seen that the implementation of triton and mlserver adapters use the following method: CalcMemCapacity to return model size.
This method returns model size based on disk size. However, for models executed in GPU it would be better to return the increase in VRAM. Do you think is doable? @tjohnson31415@rafvasq@njhill@pvaneck
I am glad to help if you think is doable, but I don't have experience in GO, but I can learn
Good afternoon,
Thank you very much for creating this amazing framework.
I have seen a potential very good feature when doing inference with GPU models. I have seen that the implementation of
tritonandmlserveradapters use the following method: CalcMemCapacity to return model size.This method returns model size based on disk size. However, for models executed in GPU it would be better to return the increase in VRAM. Do you think is doable? @tjohnson31415 @rafvasq @njhill @pvaneck
I am glad to help if you think is doable, but I don't have experience in GO, but I can learn