Regarding Inference Speed

Awesome work! I was wondering about the inference speed for the proposed model, as I see in the paper, the inference speed is tested on MT-Bench with a single 40GB A100.

<img width="474" alt="Image" src="https://github.com/user-attachments/assets/bb74cf3b-9142-4334-a141-a4ec6e659198" />

Do you have an estimation of the inference speed on other machines? I am also quite curious of how the inference time expands with length of generated sequence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding Inference Speed #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regarding Inference Speed #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions