Implement RLHF DPO (Direct Preference Optimization) training#1403
Open
BitcrushedHeart wants to merge 16 commits intoNerogar:masterfrom
Open
Implement RLHF DPO (Direct Preference Optimization) training#1403BitcrushedHeart wants to merge 16 commits intoNerogar:masterfrom
BitcrushedHeart wants to merge 16 commits intoNerogar:masterfrom