When the temperature is set to 0, baseline decoding reduces to step-by-step greedy argmax, and the output should be fully deterministic. However, in speculative decoding with tree decoding enabled and top-k set to greater than 1, the draft stage preemptively expands multiple candidate branches. The large model may accept any candidate that is valid in its distribution, potentially deviating from the baseline argmax path. This can lead to a situation where speculative decoding produces semantically reasonable outputs that differ from the baseline greedy results, with discrepancies more likely to occur when candidate branches are close in probability.
When the temperature is set to 0, baseline decoding reduces to step-by-step greedy argmax, and the output should be fully deterministic. However, in speculative decoding with tree decoding enabled and top-k set to greater than 1, the draft stage preemptively expands multiple candidate branches. The large model may accept any candidate that is valid in its distribution, potentially deviating from the baseline argmax path. This can lead to a situation where speculative decoding produces semantically reasonable outputs that differ from the baseline greedy results, with discrepancies more likely to occur when candidate branches are close in probability.