Awesome work! I was wondering about the inference speed for the proposed model, as I see in the paper, the inference speed is tested on MT-Bench with a single 40GB A100.
Do you have an estimation of the inference speed on other machines? I am also quite curious of how the inference time expands with length of generated sequence.
Awesome work! I was wondering about the inference speed for the proposed model, as I see in the paper, the inference speed is tested on MT-Bench with a single 40GB A100.
Do you have an estimation of the inference speed on other machines? I am also quite curious of how the inference time expands with length of generated sequence.