Padding tokens should be excluded from log probability calculation

Thank you for your DPO implementation. It has been very helpful. However, I noticed an issue in the `get_log_prob()` function.

When calculating `response_log_probs`, it appears that the log probabilities of padding tokens are being included in the calculation. Since there's no need to compute the log probabilities for padding tokens, I checked the HuggingFace DPOTrainer implementation to compare. In their implementation, padding token log probabilities are correctly excluded from the calculation.

This suggests that including padding tokens in your implementation is a bug. The calculation should be modified to exclude the log probabilities of padding tokens.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Padding tokens should be excluded from log probability calculation #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Padding tokens should be excluded from log probability calculation #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions