Add `ComputeEncoder` abstraction with parallel/serial barrier modes by MarijnS95 · Pull Request #1100 · llvm/offload-test-suite

MarijnS95 · 2026-04-17T14:45:12Z

Introduces a generic command encoder abstraction for recording GPU commands to a command buffer. ComputeEncoder provides dispatch() and barrier() operations with two modes: Parallel (no automatic barriers, caller manages synchronization) and Serial (auto-inserts barriers between commands using tracked destination scope as next source).

Each backend implements barrier tracking natively:

Metal: MTL::BarrierScope accumulated on the encoder
Vulkan: VkPipelineStageFlags/VkAccessFlags on the command buffer
DX12: UAV barrier flag on the command buffer

VK/DX store barrier state on the command buffer (encoders hold a back-reference) so it persists across encoder lifetimes. Metal stores it on the encoder since each MTL::ComputeCommandEncoder is a separate native object with implicit inter-encoder ordering.

Creating a parallel encoder flushes a full barrier on VK/DX to ensure prior work is visible. endEncoding() flushes any pending barriers.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

manon-traverse

Github is not loading all the files for me right now :(
But I am able to provide some feedback on the files that did manage to load.

manon-traverse · 2026-04-17T14:59:20Z

+  virtual llvm::Error dispatch(uint32_t GroupCountX, uint32_t GroupCountY,
+                               uint32_t GroupCountZ, uint32_t ThreadsPerGroupX,
+                               uint32_t ThreadsPerGroupY,
+                               uint32_t ThreadsPerGroupZ) = 0;


I would like to be able to specify the pipeline and descriptor sets as well. That way, we can enforce that everything that needs to be set up to dispatch has been set up.

Also, could we perhaps grab the threads per group from the pipeline and let the user only specify the group count OR thread count to launch?

I think reading back the threads-per-group from reflection is the most convenient. While Metal has ways to specify the total dispatch in thread-count, with automatic bounds-checking in the shader, that is not applicable to DX and Vulkan and makes this API a lot less convenient to use.

Specifically because the thread-count in Metal is fixed in place through transpiling HLSL with numthreads() to Metallib; that value should be persisted in the pipeline and passed through.

manon-traverse · 2026-05-01T13:53:02Z

+  /// Threadgroup size from shader reflection (the numthreads() attribute
+  /// persisted in the transpiled Metallib). Must be set via
+  /// setThreadGroupSize() before dispatching.
+  MTL::Size ThreadsPerGroup = {1, 1, 1};


I would love it if this were just taken from the pipeline passed to dispatch, but for that we need another PR to land. I'm fine merging this as-is and fix it later.

Yeah, for now this is just a default-initializer and executeProgram() calls into setThreadsPerGroup() directly. Having the Pipeline PR would allow a new bindPipeline() (or receiving it as the first argument to dispatch()) to update or inline this variable entirely.

MarijnS95 · 2026-05-04T09:26:49Z

I'm rewriting this PR a bit to tear out parallel mode again; its barrier placement is less than ideal for something that isn't even used yet, significantly taking away from the intent of this PR: generalizing the entrypoints to adding commands to command buffers, on top of which I intend to add acceleration structure building.

EDIT: Pushed the original rebase as a restore-point, before pushing the rewrite.

MarijnS95 · 2026-05-04T11:12:45Z

+  // Each command buffer defaults to src=HOST, which is only correct for
+  // standalone submissions.  Multi-CB batches would need inter-CB barriers.
+  if (CBs.size() > 1)
+    llvm::errs()
+        << "Warning: submitting multiple command buffers in a single batch; "
+           "encoder barriers do not account for inter-command-buffer "
+           "dependencies.\n";


This also relies on the caller waiting on the returned SubmitResult before submitting the next "batch".

Introduces a generic command encoder abstraction for recording GPU commands to a command buffer. ComputeEncoder provides dispatch() with automatic barrier insertion between commands. Each backend implements barrier tracking natively: - Vulkan: Separate pending src/dst VkPipelineStageFlags/VkAccessFlags on the command buffer. addPendingBarrier() accumulates dst; flushBarrier() emits src→dst and rotates dst into src. Default src is HOST/HOST_WRITE so the first barrier covers host uploads. - DX12: Pending UAV barrier flag on the command buffer, flushed before each dispatch. - Metal: Accumulated MTL::BarrierScope flushed via memoryBarrier() between commands. Scope is recorded after each command. VK/DX store barrier state on the command buffer (encoders hold a back-reference) so it persists across encoder lifetimes. Metal stores it on the encoder since each MTL::ComputeCommandEncoder is a separate native object. VK submit() warns when multiple command buffers are batched, as the per-CB HOST default src does not account for inter-CB dependencies. Explicit HOST readback barriers in executeProgram() are recorded directly on the command buffer, outside the encoder's tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ntations Buffer copies are recorded through the encoder abstraction. DX and VK record directly on the underlying command list/buffer. Metal lazily switches between compute and blit encoders as needed — Metal 4 removes this separation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Consistent with the readback barrier helpers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add pushDebugGroup/popDebugGroup/insertDebugSignpost to CommandEncoder with no-op defaults. Each backend overrides them: - Vulkan: vkCmd{Begin,End,Insert}DebugUtilsLabelEXT, loaded via vkGetDeviceProcAddr (naturally gated by VK_EXT_debug_utils availability) - DX12: ID3D12GraphicsCommandList BeginEvent/EndEvent/SetMarker with ANSI string encoding - Metal: pushDebugGroup/popDebugGroup/insertDebugSignpost on the active native encoder, with correct pop/push across compute/blit switches Every encoder command automatically emits a signpost with its parameters (e.g. "Dispatch [8,1,1]", "CopyBuffer 4096B", "FillBuffer 256B value=0x00"). Encoder creation pushes a "ComputeEncoder (Serial/ Parallel)" debug group, balanced by a pop in endEncoding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Makes CommandEncoder::endEncoding() idempotent (split into a non-virtual wrapper plus a protected virtual endEncodingImpl()), then has each backend's ComputeEncoder destructor call endEncoding() so an encoder destroyed without an explicit end still flushes pending barriers and pops its debug group.

DX and Vulkan bake the threadgroup size into the compiled shader, making the ThreadsPerGroup parameters redundant. Metal reads it back from shader reflection (the numthreads attribute persisted in the transpiled Metallib) and stores it in the encoder via setThreadGroupSize() before dispatching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

EmilioLaiso · 2026-05-06T12:33:41Z

LGTM

manon-traverse reviewed Apr 17, 2026

View reviewed changes

Comment thread lib/API/DX/Device.cpp Outdated

manon-traverse reviewed Apr 17, 2026

View reviewed changes

Comment thread lib/API/VK/Device.cpp Outdated

MarijnS95 force-pushed the command-encoder branch 4 times, most recently from 2d56837 to d60d30d Compare April 24, 2026 13:21

MarijnS95 force-pushed the command-encoder branch 4 times, most recently from 198b83c to 3a20f35 Compare May 1, 2026 10:46

manon-traverse reviewed May 1, 2026

View reviewed changes

manon-traverse approved these changes May 1, 2026

View reviewed changes

MarijnS95 force-pushed the command-encoder branch 2 times, most recently from e8f05a0 to e827692 Compare May 4, 2026 10:20

MarijnS95 commented May 4, 2026

View reviewed changes

MarijnS95 force-pushed the command-encoder branch from e827692 to b4c649a Compare May 4, 2026 11:28

MarijnS95 marked this pull request as ready for review May 4, 2026 11:29

MarijnS95 requested review from bogner and manon-traverse May 4, 2026 11:29

manon-traverse approved these changes May 4, 2026

View reviewed changes

MarijnS95 and others added 4 commits May 5, 2026 11:13

DX: Use CD3DX12_RESOURCE_BARRIER::Transition for upload barriers

189bc1a

Consistent with the readback barrier helpers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MarijnS95 force-pushed the command-encoder branch from b4c649a to cbc4057 Compare May 5, 2026 09:15

MarijnS95 and others added 3 commits May 5, 2026 14:37

Clean up open-coding, fixing -Wchanges-meaning too

b28b064

MarijnS95 force-pushed the command-encoder branch from 36ce256 to b28b064 Compare May 5, 2026 12:39

manon-traverse merged commit a63dd74 into llvm:main May 6, 2026
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ComputeEncoder` abstraction with parallel/serial barrier modes#1100

Add `ComputeEncoder` abstraction with parallel/serial barrier modes#1100
manon-traverse merged 7 commits intollvm:mainfrom
Traverse-Research:command-encoder

MarijnS95 commented Apr 17, 2026 •

edited

Loading

Uh oh!

manon-traverse left a comment

Uh oh!

Uh oh!

manon-traverse Apr 17, 2026

Uh oh!

MarijnS95 Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

manon-traverse May 1, 2026

Uh oh!

MarijnS95 May 1, 2026

Uh oh!

MarijnS95 commented May 4, 2026 •

edited

Loading

Uh oh!

MarijnS95 May 4, 2026

Uh oh!

EmilioLaiso commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MarijnS95 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manon-traverse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

manon-traverse Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

MarijnS95 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

manon-traverse May 1, 2026

Choose a reason for hiding this comment

Uh oh!

MarijnS95 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

MarijnS95 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarijnS95 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

EmilioLaiso commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MarijnS95 commented Apr 17, 2026 •

edited

Loading

MarijnS95 commented May 4, 2026 •

edited

Loading