Thank you for your DPO implementation. It has been very helpful. However, I noticed an issue in the get_log_prob() function.
When calculating response_log_probs, it appears that the log probabilities of padding tokens are being included in the calculation. Since there's no need to compute the log probabilities for padding tokens, I checked the HuggingFace DPOTrainer implementation to compare. In their implementation, padding token log probabilities are correctly excluded from the calculation.
This suggests that including padding tokens in your implementation is a bug. The calculation should be modified to exclude the log probabilities of padding tokens.
Thanks.
Thank you for your DPO implementation. It has been very helpful. However, I noticed an issue in the
get_log_prob()function.When calculating
response_log_probs, it appears that the log probabilities of padding tokens are being included in the calculation. Since there's no need to compute the log probabilities for padding tokens, I checked the HuggingFace DPOTrainer implementation to compare. In their implementation, padding token log probabilities are correctly excluded from the calculation.This suggests that including padding tokens in your implementation is a bug. The calculation should be modified to exclude the log probabilities of padding tokens.
Thanks.