Implementation of "Fast Inference from Transformers via Speculative Decoding". Feel free to copy & modify this code for your needs.
GPTlike and T5 models are supported.
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Implementation of "Fast Inference from Transformers via Speculative Decoding". Feel free to copy & modify this code for your needs.
GPTlike and T5 models are supported.