Some thoughts on tree decoding

When the temperature is set to 0, baseline decoding reduces to step-by-step greedy argmax, and the output should be fully deterministic. However, in speculative decoding with tree decoding enabled and top-k set to greater than 1, the draft stage preemptively expands multiple candidate branches. The large model may accept any candidate that is valid in its distribution, potentially deviating from the baseline argmax path. This can lead to a situation where speculative decoding produces semantically reasonable outputs that differ from the baseline greedy results, with discrepancies more likely to occur when candidate branches are close in probability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some thoughts on tree decoding #301

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some thoughts on tree decoding #301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions