Skip to content

[Feature Request] Atomic Ops on share memory like red.shared.add #1998

@tianr22

Description

@tianr22

Required prerequisites

  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)

Motivation

I have been writing an backward kernel which performs atomic reduction across hc_mult(4) dim. If I use atomic.global.add, which means I need to read the whole tensor into L2 which is unacceptable.

It is better to perform reduction in share memory level. I have searched the docs and atomic.h and haven't found it implemented yet. Could u please add this feature in early future?
Thanks!

Solution

No response

Alternatives

No response

Additional context

No response

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions