feat: add enable_fp32_lm_head configuration for model precision#849
feat: add enable_fp32_lm_head configuration for model precision#849LLLLKKKK merged 1 commit intoalibaba:mainfrom
Conversation
🤖 Code Review (v1) — LGTMVerdict: LGTM ✅ 将 // Before:
auto logits = torch::mm(last_hidden.to(torch::kFloat32), lm_head->kernel.to(torch::kFloat32).t());
// After:
auto logits = torch::mm(last_hidden, lm_head->kernel.t()).to(torch::kFloat32);正确的性能优化。原实现将两个 tensor 都先转 FP32 再做 GEMM,浪费内存且无法利用 tensor core。改后在原始精度下做 GEMM,仅在最后转 FP32,是标准做法。 Automated review by CI Bot |
8d7174a to
c0e7f5b
Compare
b6ee732 to
114a195
Compare
|
🤖 AI Code Review (incremental) — PR #849 Changes since last reviewSingle new commit covering two themes:
FindingsNo blocking issues. Clean decomposition, correct config plumbing, tests updated properly.
LGTM overall. |
|
🤖 AI Code Review (incremental) — PR #849 Changes since last reviewNew commit FindingsP1: Bug — hidden_states.to(self.lm_head_weight.dtype())
P2: Default |
|
🤖 AI Code Review (incremental) — PR #849 Previous P1 status
Changes since last reviewOne new commit FindingsP2: Redundant P2: No test coverage for new config Overall the previous P1 is resolved and the new code is functionally correct. The default |
No description provided.