Skip to content

Update to MLX 0.30.6#6

Open
robert-johansson wants to merge 17 commits intofrost-beta:mainfrom
robert-johansson:main
Open

Update to MLX 0.30.6#6
robert-johansson wants to merge 17 commits intofrost-beta:mainfrom
robert-johansson:main

Conversation

@robert-johansson
Copy link

Summary

  • Bump MLX submodule from 0.25.0 to 0.30.6
  • Add ki::Type specialization for mlx::core::SmallVector (MLX >= 0.26 uses SmallVector for Shape)
  • Update API call sites for breaking changes: std::vector<int>mx::Shape, new output_padding params in conv_transpose, extra arg in scaled_dot_product_attention
  • Split large ki::Set registration calls to stay within template parameter limits
  • Wrap mx::metal::device_info to return the new std::unordered_map return type

Tested on macOS with Apple Silicon (M4). All existing functionality works.

🤖 Generated with Claude Code

Robert Johansson and others added 15 commits February 22, 2026 21:08
Bump MLX submodule from v0.25.0 to v0.30.6 and fix all API changes:

- Add SmallVector<T> kizunapi type specialization (Shape changed from
  std::vector<int> to SmallVector in MLX >= 0.26)
- Add PutIntoShape helper, keep PutIntoVector for std::vector<int> uses
- Update FFT wrapper function pointer types for Shape parameter
- Add output_padding parameter to conv_transpose1d/2d/3d
- Add sinks parameter to scaled_dot_product_attention calls
- Move device_info from metal:: to gpu:: namespace
- Split large ki::Set calls to stay within template argument limits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update deps/mlx with fix for compile_fuse broadcast split_one bug that
caused "unordered_map::at: key not found" on compiled functions with
~100+ operations. This is an upstream MLX bug (v0.29.4+).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update MLX submodule with improved compile_fuse fix that preserves
the broadcast fusion optimization while fixing the aliasing bug
that caused unordered_map::at crashes on large computation graphs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Points deps/mlx to ml-explore/mlx main (c8536f52) which includes
the merged compile_fuse broadcast split fix from PR #3166, plus
newer upstream fixes (Metal event leak, conv3d overflow, fence sync).

Replaces the local branch commits (65cefdef, a6d40e4a) which are
now superseded by the upstream merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update MLX submodule to include native lgamma/digamma kernels and
add Node.js bindings for both operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update deps/mlx submodule URL to robert-johansson/mlx (genmlx branch)
  with lgamma, digamma, bessel_i0e, bessel_i1e ops
- Add besselI0e/besselI1e bindings in ops.cc and type declarations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Report external memory (min 1MB per array) via napi_adjust_external_memory
so the JS GC knows about Metal GPU buffer pressure. This makes GC run
earlier, reducing the chance of hitting Metal's 499K allocation limit.

- Point kizunapi submodule to robert-johansson fork with ExternalMemorySize trait
- Specialize ExternalMemorySize for mx::array (1MB minimum cost)
- Add napi_adjust_external_memory calls in Tidy and Dispose paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a native function that bypasses the deferred N-API finalizer queue by
synchronously walking the wrapper registry and freeing arrays whose JS
wrappers have been GC'd. This is critical for synchronous inference loops
where the event loop never yields and deferred finalizers never run.

Includes kizunapi changes:
- CollectDeadWrappers<T>() in InstanceData
- ExternalMemorySize reporting on AllowPassByValue path
- Double-free guard in finalizer callbacks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kernel .h changes now take effect via JIT source string regeneration
without needing to manually delete .air/.metallib files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed calls

The Tidy function captured `auto& top = g_tidy_arrays.top()` and passed
`[&top]` to the AwaitFunction lambda. If the lambda executed after the
stack was modified (async Promise path, or nested tidy calls), `top`
became a dangling reference → segfault at address 0x5.

Fix: move the set off the stack inside cpp_then (at execution time, not
capture time). Use a shared_ptr<bool> flag to coordinate between cpp_then
and cpp_finally so the stack is popped exactly once — cpp_then pops on
success, cpp_finally pops only on error (if cpp_then didn't run).

Verified: nested tidy (3 levels), 218K-call stress test, GenMLX test suite
(165/165 gen_clj_compat).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eference

ExternalMemorySize::Get(a) was called on array pointers before checking
if the pointer was still valid. If JS GC had already finalized the array
(calling TypeBridge::Finalize → delete), the pointer was dangling.

Fix: check GetWrapper/DeleteWrapper first. Only access the pointer if
the wrapper map confirms it's still alive (states 1 or 3, not state 2).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the JS function passed to valueAndGrad threw during tracing,
the error was silently swallowed. The traced lambda returned an
empty vector, MLX's value_and_grad continued with garbage, and
TreeUnflatten returned a stale tracer Symbol instead of a concrete
mx.array. No error was ever propagated to the caller.

Fix: track callback failure with a flag. After value_and_grad_func
returns, check the flag and throw instead of proceeding with
invalid results.

Reproducer (before fix):
  const vg = mx.valueAndGrad((w, x) => { throw new Error('oops'); });
  const [v, g] = vg(mx.array([1]), mx.array([2]));
  // v.constructor.name was 'Symbol' — should have thrown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update deps/mlx to genmlx-rebased branch which includes:
  - 53 upstream commits (teardown fix, split-K matmul, etc.)
  - Library cleaner: Metal shader pipelines are released when compiled
    functions are erased from the compile cache
  - Custom ops (lgamma, digamma, bessel, vmap floor_divide fix)

- Export mx.detail.compile_clear_cache as compileClearCache in JS
  bindings, allowing explicit cleanup of all compiled function caches
  and their associated Metal resources.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: after mx.eval(), each array retains shared_ptr references
to its inputs through the computation graph. Under Bun/JSC, the GC is
non-deterministic and finalizers are deferred to the event loop. In
synchronous code (which nbb/ClojureScript is), finalizers never fire,
so Metal buffers accumulate monotonically — num_resources grew from 26
to 18,000+ in 60 seconds, eventually hitting the macOS 499K limit.

Fix: call array.detach() on evaluated arrays in Eval(). This severs
the graph links (primitive + inputs), allowing parent arrays and their
Metal buffers to be freed immediately. Safe because node-mlx manages
gradients via separate valueAndGrad/grad transforms that trace their
own graphs — the forward graph is never reused after eval.

Also:
- Expose getNumResources/getResourceLimit for Metal buffer monitoring
- Move SweepDeadArrays to shared header for cross-file access
- Update MLX submodule with resource tracking API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wraps mx::searchsorted in node-mlx NAPI bindings.
TypeScript declaration added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant