Refactor EXLA block lowering through EXLA.CustomCall protocol by Chapaman · Pull Request #1739 · elixir-nx/nx

Chapaman · 2026-04-20T22:57:29Z

Summary

Introduced EXLA.CustomCall (apply?/4, call/4) as the hook for native lowering of Nx.block/4 in EXLA, with @fallback_to_any true and implementations on Any for the blocks that previously had dedicated clauses in defn.ex.
EXLA.Defn: replaced the long chain of :block special cases with one cached_recur_operator(:block, …) path — recurse tensor args, then apply? → call if true, else default_block_implementation/5 (the previous generic subfunction + Value.call path).
Moved native paths for Nx.Block.LinAlg.QR, Nx.Block.LinAlg.Eigh, Nx.Block.Take, Nx.Block.TopK, Nx.Block.FFT2, Nx.Block.IFFT2, Nx.Block.RFFT, and Nx.Block.IRFFT into the protocol impl; behavior should match the old code paths.
Helpers: a few EXLA.Defn functions are now @doc false public (to_type, op_type, op_shape, expr_to_typespec, axes_for_rank, fft, fft2) so the protocol can call them; fft / fft2 no longer take state (builder comes from %Value{}.function).
Note: QR eligibility uses the type kind from the output template (e.g. {:f, _} vs {:c, _}), not q.type != :c (types are tuples).

Chapaman · 2026-04-20T23:10:39Z

+defprotocol EXLA.CustomCall do
+  @moduledoc """
+  Protocol used by `EXLA.Defn` to lower specific `Nx.block/4` tags natively
+  instead of compiling the fallback callback.
+
+  Implementations receive the block tag struct, the output template (`out`),
+  the already-recursed MLIR `EXLA.MLIR.Value` arguments and the active
+  `EXLA.Client`.
+  """
+
+  @fallback_to_any true
+
+  @doc """
+  Returns `true` when EXLA should lower the block natively via `call/4`.
+
+  When it returns `false`, `EXLA.Defn` falls back to compiling the block's
+  default callback implementation.
+  """
+  def apply?(struct, out, args, client)
+
+  @fallback_to_any true
+
+  @doc """
+  Lowers the block natively.
+
+  Must return the list of `EXLA.MLIR.Value`s (or a single value) that
+  represents the block result, matching the shape of `out`.
+  """
+  def call(struct, out, args, client)
+end
+
+defimpl EXLA.CustomCall, for: Any do
+  alias EXLA.MLIR.Value
+  alias EXLA.Defn
+
+  # --- apply?/4 ---
+
+  def apply?(
+        %Nx.Block.LinAlg.QR{},
+        {%{type: {q_type_kind, _}}, _r},
+        _args,
+        client
+      ) do
+    q_type_kind != :c and client.platform == :host
+  end


We split this into apply?/4 and call/4 because a single callback would mix “should EXLA take the native path for this compile?” with “emit MLIR.” Keeping two functions separates eligibility from lowering.

apply? answers whether the native path applies for this block tag in context: e.g. for QR we still have %Nx.Block.LinAlg.QR{}, but we only native-lower when the output type is real (not complex) and the client is :host. Another block could gate on arity, struct fields, mesh, etc.—without duplicating the full call body.

call only runs when apply? is true and contains the actual Value.* / Defn lowering.

EXLA.Defn mirrors that split: recurse operands → apply? → either call or the generic block fallback (default_block_implementation).

You could have a single callback and instead return :skip in call when it cannot be lowered.

josevalim · 2026-04-21T04:45:56Z

+  def call(struct, out, args, client)
+end
+
+defimpl EXLA.CustomCall, for: Any do


Should those be different implementations rather than having it all in Any?

This way the user can provide their own overrides!

Ah, I see! Let's add a comment explaining why then! :D

josevalim · 2026-04-21T04:50:03Z

@polvalente @Chapaman very nice! However, I think we should couple this more closely to actual custom calls (in C), because that's how it will work in practice. In other words, we should only move to the protocol blocks which are implemented as custom C calls. This will make it easier to see what they have in common (and we won't need to get distracted with things like take, which is quite different).

…e for custom_call targets

polvalente · 2026-04-27T22:28:36Z

+// Loads a shared library with RTLD_GLOBAL so XLA FFI static registrations run.
+// Used from tests (e.g. alias custom-call registration plugin).
+fine::Ok<> dlopen_test_plugin(ErlNifEnv *env, std::string path) {
+  void *handle = dlopen(path.c_str(), RTLD_NOW | RTLD_GLOBAL);
+  if (handle == nullptr) {
+    const char *err = dlerror();
+    throw std::invalid_argument(err ? err : "dlopen failed");
+  }
+  (void)handle;
+  return fine::Ok();
+}
+
+FINE_NIF(dlopen_test_plugin, 0);


Let's expose this as load_dylib so that it's not used just for tests

polvalente · 2026-04-27T22:31:14Z

+//
+// Built when BUILD_EXLA_TEST_PLUGIN=1 (Mix test). Load with RTLD_GLOBAL via
+// EXLA.NIF.dlopen_test_plugin/1 before compiling or running graphs that emit
+// the alias call_target_name.


We can wrap this file in ifndef EXLA_PROD to exclude its contents from in MIX_ENV=prod.
Let's also remove the plugin suffix. I suggest exla_test/custom_calls.cc as the file path in c_src

polvalente · 2026-04-27T22:38:03Z

+ifneq ($(BUILD_EXLA_TEST_PLUGIN),)
+ifneq ($(BUILD_EXLA_TEST_PLUGIN),0)
+$(EXLA_SO): $(TEST_PLUGIN_SO)
+endif
+endif


Suggested change

ifneq ($(BUILD_EXLA_TEST_PLUGIN),)

ifneq ($(BUILD_EXLA_TEST_PLUGIN),0)

$(EXLA_SO): $(TEST_PLUGIN_SO)

endif

endif

polvalente · 2026-04-27T22:40:19Z

+          "EXLA_VERSION" => "#{@version}",
+          "BUILD_EXLA_TEST_PLUGIN" => if(Mix.env() == :test, do: "1", else: "0")


Suggested change

"EXLA_VERSION" => "#{@version}",

"BUILD_EXLA_TEST_PLUGIN" => if(Mix.env() == :test, do: "1", else: "0")

"EXLA_VERSION" => "#{@version}"

polvalente · 2026-04-27T22:45:57Z

+$(TEST_PLUGIN_SO): $(TEST_PLUGIN_CC) | $(XLA_EXTENSION_DIR)
+	@ mkdir -p $(dir $@)
+	$(CXX) $(CFLAGS) -shared $(TEST_PLUGIN_CC) -o $@ $(LDFLAGS)
+
 $(EXLA_SO): $(EXLA_CACHE_SO)


Suggested change

$(TEST_PLUGIN_SO): $(TEST_PLUGIN_CC) | $(XLA_EXTENSION_DIR)

@ mkdir -p $(dir $@)

$(CXX) $(CFLAGS) -shared $(TEST_PLUGIN_CC) -o $@ $(LDFLAGS)

$(EXLA_SO): $(EXLA_CACHE_SO)

$(TEST_PLUGIN_SO): $(TEST_PLUGIN_CC) | $(XLA_EXTENSION_DIR)

@ mkdir -p $(PRIV_DIR)

$(CXX) $(CFLAGS) -shared $(TEST_PLUGIN_CC) -o $@ $(LDFLAGS)

EXLA_SO_DEPS = $(EXLA_CACHE_SO)

ifeq($(MIX_ENV),test)

EXLA_SO_DEPS += $(TEST_PLUGIN_SO)

end

$(EXLA_SO): $(EXLA_SO_DEPS)

polvalente · 2026-04-27T22:47:39Z

+  @doc false
+  def qr_with_call_target(
+        %Value{function: func} = value,
+        q_typespec,
+        r_typespec,
+        call_target_name
+      )
+      when is_binary(call_target_name) do


What if the protocol instead had 2 callbacks:

@spec function_name(struct, output_container, input_templates_list, client) :: String.t() | :skip @spec config(struct, output_container, input_templates_list, client) :: map() | nil

function_name would be the string name registered by invoking the .so (what we have here as call_target_name)

input and output typespecs we can infer ourselves

and backend_config would be obtained from the config callback. We'd have to validate all map values are encodable as mlir::DictionaryAttr (but we can encode them ourselves).

There's a separate discussion on whether we expose other things such as has_side_effect and output_operand_aliases, but we can add these afterwards.

polvalente · 2026-04-30T05:08:49Z

+    case {operand_type, computation_type} do
+      {{:f, 32}, {:f, 32}} -> "eigh_cpu_custom_call_f32"
+      {{:f, 64}, {:f, 64}} -> "eigh_cpu_custom_call_f64"
+      {{:s, 8}, {:f, 32}} -> "eigh_cpu_custom_call_s8"
+      {{:s, 16}, {:f, 32}} -> "eigh_cpu_custom_call_s16"
+      {{:s, 32}, {:f, 32}} -> "eigh_cpu_custom_call_s32"
+      {{:s, 64}, {:f, 32}} -> "eigh_cpu_custom_call_s64"
+      {{:u, 8}, {:f, 32}} -> "eigh_cpu_custom_call_u8"
+      {{:u, 16}, {:f, 32}} -> "eigh_cpu_custom_call_u16"
+      {{:u, 32}, {:f, 32}} -> "eigh_cpu_custom_call_u32"
+      {{:u, 64}, {:f, 32}} -> "eigh_cpu_custom_call_u64"
+      _ -> :skip
+    end


I think the name is invariant to the output type. We can just use the input type

Also, why are these not defps in the protocol impl?

Also, why are these not defps in the protocol impl?

I thought I needed to make it a def and not defp because I'm using it in both EXLA.CustomCall and EXLA.MLIR.Value should I change it?

Are Value.qr and Value.eigh actually used? I think if they are, they should just be aliases to Value.custom_call (which will end up calling the protocol).

If this is indeed possible, than these functions will just be used in the protocol implementation.

😅😅 it was not being used lol
removed it :~

polvalente · 2026-04-30T23:45:31Z

+defmodule EXLA.CustomCall.Builtins do
+  @moduledoc false
+
+  @doc """
+  Host CPU `stablehlo.custom_call` target for `Nx.LinAlg.qr/2`, or `:skip`.
+
+  `operand_type` is the input matrix element type.
+  """
+  def qr_cpu_target(operand_type) do
+    case operand_type do
+      {:f, 32} -> "qr_cpu_custom_call_f32"
+      {:f, 64} -> "qr_cpu_custom_call_f64"


Let's get rid of this module and only define the target resolution inside the call sites

josevalim · 2026-05-05T08:30:05Z


 } // namespace exla

+// Host QR custom calls: integer operands with f32 Q/R (see Nx.Type.to_floating/1


This should not be here, right? Is this for test only?

In refactoring to use the protocol, we ended up losing the ability to do typecasts for the operands like we did before.

We either would have to add these or we'd have to add an extra callback to the protocol to make it work in this simpler version which doesn't leak EXLA.MLIR.

In the final implementation these will be absorbed into elixir-nx/xla and we'll just use them.

@polvalente we could also allow the custom call to return the types of the inputs and we cast them if they different. For most cases, the types of the inputs are the ones given as argument.

I guess this is a good middle ground. Let's do this and we'll have a proper struct for the return format with the input types, callback name, attributes and whatnot

josevalim · 2026-05-05T08:32:55Z

+
+  def function_name(_, _, _, _), do: :skip
+
+  def config(_, _, _, _), do: nil


We don't need a separate callback, make it return {:ok, name, config}. We also need to better document what is the config.

What about if we add output operand aliases too? Would you still prefer a single callback?

We return everything from a single callback, so we make sure the format supports extensions.

josevalim · 2026-05-05T08:33:26Z


-  defp to_type(%Value{} = op, type) do
+  @doc false
+  def to_type(%Value{} = op, type) do


Why are we converting all of these to public?

I think this is leftover from a previous iteration

Refactor EXLA block lowering through EXLA.CustomCall protocol

014eb64

Chapaman commented Apr 20, 2026

View reviewed changes

josevalim reviewed Apr 21, 2026

View reviewed changes

update EXLA.CustomCall to handle C-backed Nx.block tags (QR, Eigh)

5db679f

Chapaman marked this pull request as ready for review April 24, 2026 21:35

Chapaman marked this pull request as draft April 25, 2026 02:26

Chapaman added 3 commits April 24, 2026 23:30

test(exla): add QR FFI alias plugin + dlopen NIF and MLIR/JIT coverag…

4011521

…e for custom_call targets

fix formatting

3152003

CustomCall now has only one callback + add documentation

f2aa558

polvalente reviewed Apr 27, 2026

View reviewed changes

Chapaman added 2 commits April 29, 2026 19:11

update based on polvalente comments

2d8e9c2

upcast integers to float in C

adf2369

polvalente reviewed Apr 30, 2026

View reviewed changes

polvalente requested a review from josevalim April 30, 2026 05:09

Chapaman added 2 commits April 30, 2026 18:07

remove name from qr and eigh _cpu_target in custom_call.ex

c058794

remove Value.qr/3 and Value.eigh/3 as they are not being used

e424803

polvalente reviewed Apr 30, 2026

View reviewed changes

Chapaman added 2 commits May 4, 2026 18:53

remove CustomCall.Builtins, add integer test on custom_call_alias_test

238743d

.

3f2cb9e

josevalim reviewed May 5, 2026

View reviewed changes

		"EXLA_VERSION" => "#{@version}",
		"BUILD_EXLA_TEST_PLUGIN" => if(Mix.env() == :test, do: "1", else: "0")

	"EXLA_VERSION" => "#{@version}",
	"BUILD_EXLA_TEST_PLUGIN" => if(Mix.env() == :test, do: "1", else: "0")
	"EXLA_VERSION" => "#{@version}"


		} // namespace exla

		// Host QR custom calls: integer operands with f32 Q/R (see Nx.Type.to_floating/1


		def function_name(_, _, _, _), do: :skip

		def config(_, _, _, _), do: nil

Conversation

Chapaman commented Apr 20, 2026

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josevalim commented Apr 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants