Skip to content

Latest commit

 

History

History
43 lines (29 loc) · 554 Bytes

File metadata and controls

43 lines (29 loc) · 554 Bytes

LLM Decoding

Installation

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Install dependencies:

uv sync

Add your HuggingFace token to .env:

HF_TOKEN=hf_...

Decoding

Minimal Greedy Decoding

uv run --env-file .env greedy_decode.py

Results on 1x 4090:

  • Time taken: 0.75s
  • Output throughput: 33.44 tok/s

Prompt Lookup Speculative Decoding

uv run --env-file .env speculative_decode.py

Results on 1x 4090:

  • Time taken: 0.62s
  • Output throughput: 40.16 tok/s