[Feature Request] Reimplement Load Model of Triton and MLServer

Good afternoon,

Thank you very much for creating this amazing framework.

 I have seen a potential very good feature when doing inference with GPU models. I have seen that the implementation of `triton` and `mlserver` adapters use the following method: [CalcMemCapacity](https://github.com/kserve/modelmesh-runtime-adapter/blob/dd8c2c29f1e9ccfb1f4006d60188832b877c9ec5/internal/util/loadmodel.go#L70) to return model size. 

This method returns model size based on disk size. However, for models executed in GPU it would be better to return the increase in VRAM. Do you think is doable?  @tjohnson31415  @rafvasq @njhill @pvaneck 

I am glad to help if you think is doable, but I don't have experience in GO, but I can learn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Reimplement Load Model of Triton and MLServer #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Reimplement Load Model of Triton and MLServer #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions