Implementation of "Fast Inference from Transformers via Speculative Decoding". Feel free to copy & modify this code for your needs.
GPTlike and T5 models are supported.
Implementation of "Fast Inference from Transformers via Speculative Decoding". Feel free to copy & modify this code for your needs.
GPTlike and T5 models are supported.