Skip to content

Padding tokens should be excluded from log probability calculation #4

@likejazz

Description

@likejazz

Thank you for your DPO implementation. It has been very helpful. However, I noticed an issue in the get_log_prob() function.

When calculating response_log_probs, it appears that the log probabilities of padding tokens are being included in the calculation. Since there's no need to compute the log probabilities for padding tokens, I checked the HuggingFace DPOTrainer implementation to compare. In their implementation, padding token log probabilities are correctly excluded from the calculation.

This suggests that including padding tokens in your implementation is a bug. The calculation should be modified to exclude the log probabilities of padding tokens.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions