fix: CpuGemmDirectConv2d: accumulate in fp32 unless in fast_mode by Sqvid · Pull Request #1287 · ARM-software/ComputeLibrary

Sqvid · 2026-05-05T13:21:17Z

We have a bug in PyTorch+oneDNN+ACL where numerical errors were observed in certain f16 convs.
PyTorch issue: pytorch/pytorch#177245
oneDNN issue: uxlfoundation/oneDNN#5106

The root cause is that we are accumulating in f16 rather than in f32 even when fast_mode is false. The reason this happens is because CpuGemmDirectConv2d::configure() calls CpuGemmAssemblyDispatch::configure() here:

ComputeLibrary/src/cpu/operators/CpuGemmDirectConv2d.cpp

Line 136 in d619e50

_gemm_asm_func->configure(src, &_perm_weights, biases, dst, asm_info);

Which does the following:

ComputeLibrary/src/cpu/operators/internal/CpuGemmAssemblyDispatch.cpp

Lines 853 to 856 in d619e50

    
           // If fast_mode is disabled, we must enable it when fp32 accumulation is not set for fp16. 
        
           bool is_fp16 = 
        
               a->data_type() == DataType::F16 && b->data_type() == DataType::F16 && d->data_type() == DataType::F16; 
        
           bool fast_mode = info.fast_mode || (is_fp16 && !info.use_fp32_acc);

My solution is therefore to change CpuGemmDirectConv2d::init_assembly_metadata() such that use_fp32_acc is true unless enable_fast_mode is set. This resolves the bug at the oneDNN level. Hopefully @robert-hardwick can confirm if this fixes the PyTorch bug as well.

Change-Id: Id00f8e17b3349893164eb0b7edd616345488515e Signed-off-by: Siddhartha Menon <siddhartha.menon@arm.com>

Dongsung-arm

Looks good to me.
This matches the intended behavior: FP32 accumulation by default, and only disabled when enable_fast_math is set.

fix: CpuGemmDirectConv2d: accumulate in fp32 unless in fast_mode

653807a

Change-Id: Id00f8e17b3349893164eb0b7edd616345488515e Signed-off-by: Siddhartha Menon <siddhartha.menon@arm.com>

Dongsung-arm approved these changes May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: CpuGemmDirectConv2d: accumulate in fp32 unless in fast_mode#1287

fix: CpuGemmDirectConv2d: accumulate in fp32 unless in fast_mode#1287
Sqvid wants to merge 1 commit intoARM-software:mainfrom
Sqvid:fix-f16-fast-conv

Sqvid commented May 5, 2026

Uh oh!

Dongsung-arm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// If fast_mode is disabled, we must enable it when fp32 accumulation is not set for fp16.
	bool is_fp16 =
	a->data_type() == DataType::F16 && b->data_type() == DataType::F16 && d->data_type() == DataType::F16;
	bool fast_mode = info.fast_mode \|\| (is_fp16 && !info.use_fp32_acc);

Conversation

Sqvid commented May 5, 2026

Uh oh!

Dongsung-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants