Skip to content

[Metal] Explicit root signature layout for Metal backend#1061

Merged
bogner merged 15 commits into
llvm:mainfrom
kcloudy0717:kcloudy0717/metal_irconverter
May 11, 2026
Merged

[Metal] Explicit root signature layout for Metal backend#1061
bogner merged 15 commits into
llvm:mainfrom
kcloudy0717:kcloudy0717/metal_irconverter

Conversation

@kcloudy0717
Copy link
Copy Markdown
Contributor

Resolves #351, #744, #305.

This PR changes the automatic linear layout binding model to the more flexible explicit root signature binding model.

Notable changes:

  • We no longer invokes -metal && -Fre to the compiler command line in LIT config. Reason for this is because explicit root signature model requires the RS to be bound to the IR compiler during IR conversion time.
  • Introduced MTLTopLevelArgumentBuffer (TLAB) class that replaces the legacy MTL::Buffer *ArgsBuffer in InvocationState. This class encapsulates a buffer for the backing memory of the data of all IR resources for the given root signature. Provides D3D12-like functions for binding root constants, root descriptors, and descriptor tables.
  • Introduced MTLDescriptorHeap class that replicates D3D12's descriptor heap concept. It serves as a storage for descriptors, binding a descriptor table to the TLAB must come from a descriptor heap just like D3D12. This bridge the gap between D3D12/Metal API differences allowing future tests such as dynamic resource indexing and unbound resource arrays easy to support.
  • Resource/Descriptor creation is similar to the D3D12 backend thanks to MTLTopLevelArgumentBuffer and MTLDescriptorHeap, this allows resource arrays to be supported in the Metal backend now.

@kcloudy0717
Copy link
Copy Markdown
Contributor Author

Pinging @bogner, @bob80905, @farzonl, @manon-traverse for review.

Comment thread lib/API/MTL/MTLDescriptorHeap.cpp Outdated
Comment on lines +1 to +3
#define IR_RUNTIME_METALCPP
#include "MTLDescriptorHeap.h"
#include "metal_irconverter_runtime.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to limit the IR_RUNTIME_METALCPP to the code that needs it:

#pragma push_macro("IR_RUNTIME_METALCPP")
#include "metal_irconverter_runtime.h"
#pragma pop_macro("IR_RUNTIME_METALCPP")

If we're going to use these APIs in more places it might even be worth wrapping it in a header that just does that so it's harder to make a mistake.

(I'd also make an argument for adding a MetalIRConverter.cpp and moving the IR_PRIVATE_IMPLEMENTATION inclusion of this header to that instead of Device, but that's quickly getting out of scope for this particular change)

#include "llvm/Support/Error.h"
#include <memory>

struct IRDescriptorTableEntry;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit awkward that metal_irconverter_runtime.h just puts all of its definitions in the global namespace. Maybe a comment saying this forward declaration is for an object from there would be helpful.

Comment thread lib/API/MTL/MTLDescriptorHeap.h Outdated
struct IRDescriptorTableEntry;

namespace offloadtest {
struct METAL_GPU_DESCRIPTOR_HANDLE {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick to LLVM naming conventions for these types (ie, MetalGPUDescriptorHandle)

Comment thread lib/API/MTL/MTLDescriptorHeap.h Outdated
Comment thread lib/API/MTL/MTLDescriptorHeap.h Outdated
Comment on lines +28 to +31
enum METAL_DESCRIPTOR_HEAP_TYPE {
METAL_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV,
METAL_DESCRIPTOR_HEAP_TYPE_SAMPLER,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good place for enum class, then we don't need the METAL_DESCRIPTOR_HEAP_TYPE on all of the values.

I'm also not entirely sure about the CBV or SRV or UAV value - is there a clearer name for this? I suspect we'll either actually want three different values here or that we can get away with something like Sampler and NotSampler

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we keep the descriptor heap API as close to D3D12 as possible. The enum defined here maps to D3D12's enum:

typedef enum D3D12_DESCRIPTOR_HEAP_TYPE {
  D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV = 0,
  D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER,
  D3D12_DESCRIPTOR_HEAP_TYPE_RTV,
  D3D12_DESCRIPTOR_HEAP_TYPE_DSV,
  D3D12_DESCRIPTOR_HEAP_TYPE_NUM_TYPES
} ;

Having three separate values for each CBV/SRV/UAV would make dynamic resource indexing via ResourceDescriptorHeap not possible.

FWIW, I think we can just change this to enum class and drop the METAL_DESCRIPTOR_HEAP_TYPE_ prefix for the values. I can't really think of a better name for CBV_SRV_UAV so I would say we just keep as it is.

Comment thread lib/API/MTL/MTLDescriptorHeap.cpp Outdated
Comment thread lib/API/MTL/MTLTopLevelArgumentBuffer.h Outdated

#include "MTLDescriptorHeap.h"
#include "Metal/Metal.hpp"
#include "metal_irconverter.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all of this for the header?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think I can get away with forward declarations for some of them. I'll do that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be checking a dylib in to the repo. I didn't realize that metal_irconverter wasn't distributed header only like metal_irconverter_runtime when I suggested we could just add the dependency - this makes things more complicated unfortunately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the preferred approach? Maybe we can use CMake Fetch_Content to make this bit simpler?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized the tests aren't being ran probably because of this dylib dependency. Do let me know if there is a better approach for this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the metal-shaderconverter tool and this dylib are distributed together, and we already have the dependency on the tool. I guess what we really want to do is wire up some cmake logic to find where the dylib is installed and link to that one.

@llvm-beanz does this sound right to you?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we already have this as a dependency and can safely assume the machines have the IR converter installed, this makes the logic of wiring up the dylib simple.

The document specifies the directory which the tool will be installed:

The installer contains the command-line tools, dynamic libraries, header files and documentation. On
installation you will find:

  • /usr/local/bin/ contains the metal-shaderconverter tool.
  • /usr/local/include/ contains the headers for the metal_irconverter library .
  • /usr/local/lib/ contains the metal_irconverter library for macOS.
  • /usr/local/lib_iOS/ contains the metal_irconverter library for iOS.
  • /opt/metal-shaderconverter/ contains Metal Shader Converter documentation.

I modified the CMake script to use find_library to retrieve the dylib:

  set(METAL_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/third-party/metal-cpp
                         /usr/local/include/metal_irconverter
                         ${CMAKE_CURRENT_SOURCE_DIR}/third-party/metal_irconverter_runtime)
  find_library(METAL_IRCONVERTER_LIBRARY metalirconverter PATHS /usr/local/lib REQUIRED)
  set(METAL_LIBRARIES ${METAL_IRCONVERTER_LIBRARY})

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the libmetalirconverter.dylib is not in the /usr/local/lib. Do we know where it's downloaded? Also maybe it'd be a better idea for us to have a CMake option to specify the path of the dylib, as the install location could potentially be different.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch what I said, it looks like I needed to link the dylib for the api-query and offloader. Metal checks on CI is green.

Comment thread include/Support/Pipeline.h
Comment thread tools/offloader/offloader.cpp Outdated
Comment on lines +131 to +138
// Maybe there is a better way to detect GPUAPI?
#ifdef __APPLE__
APIToUse = GPUAPI::Metal;
outs() << "Using Metal API\n";
#else
APIToUse = GPUAPI::DirectX;
outs() << "Using DirectX API\n";
#endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me as Metal is Apple only.

@kcloudy0717 kcloudy0717 force-pushed the kcloudy0717/metal_irconverter branch 4 times, most recently from 346ee84 to 9211702 Compare April 16, 2026 09:02
@kcloudy0717 kcloudy0717 force-pushed the kcloudy0717/metal_irconverter branch from 3621a54 to da6cdff Compare April 26, 2026 15:27
@kcloudy0717 kcloudy0717 force-pushed the kcloudy0717/metal_irconverter branch 3 times, most recently from 532e4a6 to a474316 Compare May 4, 2026 16:41
Copy link
Copy Markdown
Contributor

@manon-traverse manon-traverse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!
My main complaint in this code is a pattern around resource and descriptor that stems from how the DX12 backend is setup. However, this PRs brings the backends closer together so this will make fixing that pattern quite easy!

Copy link
Copy Markdown
Contributor

@bogner bogner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking pretty good. Mostly minor comments but I wouldn't mind one more look over before I'm ready to approve.

Comment thread lib/API/MTL/MetalIRConverter.h Outdated
Comment on lines +8 to +10
//
//
//===----------------------------------------------------------------------===//
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need an empty comment block here - if we have things to say about the file this is the place but it's totally reasonable to omit it instead of leaving it blank.

(also applies later in the patch)

@@ -0,0 +1,72 @@
#include "MTLDescriptorHeap.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing file header - we do want the license statement in every file.

Comment thread lib/API/MTL/MTLDescriptorHeap.cpp Outdated
}

MTLGPUDescriptorHandle &
MTLGPUDescriptorHandle::Offset(int32_t OffsetInDescriptors) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things here:

  • While Offset is certainly useable as a verb, it is also a noun. This function doesn't necessarily sound like it's modifying the object. Maybe something like increment or addOffset would be better?
  • LLVM style says member functions should start with a lower case letter.

Comment on lines +167 to +168
if (!Buffer)
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is silently doing nothing really the right thing to do here? Should we assert that Buffer isn't null instead? I assume calling bind on an ArgumentBuffer without a Buffer is a programmer error here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly follows D3D12's root signature behavior. In D3D12, empty root signatures are allowed, binding a empty root signature is a no-op. I went with this approach given the similarity of the root signature APIs, if this isn't desired, we can remove the check here and assert instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. This seems fine like this.

Comment thread lib/API/MTL/MTLTopLevelArgumentBuffer.h Outdated
// Manages a Metal buffer that serves as the top-level argument buffer for
// shader resource binding with the explicit root signature layout.
class MTLTopLevelArgumentBuffer {
std::vector<IRResourceLocation> ResourceLocs;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcloudy0717 kcloudy0717 force-pushed the kcloudy0717/metal_irconverter branch from 4565630 to a8b1453 Compare May 7, 2026 08:54
@kcloudy0717
Copy link
Copy Markdown
Contributor Author

@manon-traverse @bogner I fixed conflicts after rebasing to latest main. Branch should be good to go, just waiting on CI, feel free to have a last look.

@kcloudy0717 kcloudy0717 force-pushed the kcloudy0717/metal_irconverter branch from a8b1453 to 38f1bb9 Compare May 11, 2026 15:53
@bogner
Copy link
Copy Markdown
Contributor

bogner commented May 11, 2026

The metal failures are #1166, which has been fixed since those ran, and this is NFC anywhere other than Mac. I'm going to go ahead and merge this for you. Thanks!

@bogner bogner merged commit e8844e4 into llvm:main May 11, 2026
11 of 17 checks passed
bogner added a commit that referenced this pull request May 11, 2026
A couple of warnings for unused variables slipped in. Clean them up.
bogner added a commit that referenced this pull request May 11, 2026
Most of these really should've been XFAIL rather than UNSUPPORTED, but
in any case, after #1061 they're supported.

These all pass now except `Feature/ResourceArrays/array-local2.test`,
which fails for the same reason the test fails on DirectX.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal loader logic doesn't account for resources that don't match the automatic linear layout

3 participants