Skip to content

Implement RLHF DPO (Direct Preference Optimization) training#1403

Open
BitcrushedHeart wants to merge 16 commits intoNerogar:masterfrom
BitcrushedHeart:RLHF
Open

Implement RLHF DPO (Direct Preference Optimization) training#1403
BitcrushedHeart wants to merge 16 commits intoNerogar:masterfrom
BitcrushedHeart:RLHF

Commits

Commits on Mar 29, 2026

Commits on Apr 1, 2026

Commits on Apr 2, 2026

Commits on Apr 3, 2026

Commits on Apr 4, 2026

Commits on Apr 5, 2026

Commits on Apr 6, 2026

Commits on Apr 9, 2026

Commits on Apr 12, 2026

Commits on Apr 14, 2026

Commits on Apr 19, 2026