The first real-time neural synthesizer running directly on Apple Neural Engine — bypassing CoreML entirely.
157 microseconds per audio buffer. 79x real-time headroom. Zero CPU cores consumed during inference. 8-voice polyphony from a single batched ANE dispatch.
Built on the ANE reverse-engineering work by maderix/ANE.
Your Apple Silicon Mac contains a dedicated 19 TOPS neural engine — a separate chip, separate memory bus, separate power domain. It runs while your CPU sleeps. And almost nobody is using it for audio.
Every neural audio tool in existence — Neutone, RAVE, nn~, NAM, DDSP-VST — routes inference through Python, PyTorch, or CoreML's XPC broker. Each layer adds overhead. None of them touch the ANE directly.
ane-synth does.
MIDI → [ANE: 157µs, 0 CPU, ~2.8W] → [Accelerate/vDSP: vectorized additive synth] → CoreAudio
The ANE finishes processing all 8 voices before your audio thread has spent a single microsecond on synthesis. The CPU never touches neural inference. It just does math with NEON vectors.
Measured on Apple M-series (44.1kHz, 512-sample buffer, 11.6ms deadline):
| Path | Latency | Budget Used | CPU Impact |
|---|---|---|---|
| Direct ANE (ours) | 157 µs | 1.36% | 0 cores |
CoreML MLModel.predict() |
~720 µs | ~6.2% | 0 cores |
| CPU scalar conv (FP32) | ~3,100 µs | ~26.7% | steals cores |
| Path | Time | Notes |
|---|---|---|
| Direct ANE (ours) | ~160 ms | MIL compiled in-process, no XPC |
| CoreML standard | ~740 ms | .mlpackage compile + MLModel load over XPC |
| CPU | 0 ms | No compilation step |
| Path | Latency | Speedup |
|---|---|---|
Scalar sin() loop |
~2,200 µs | 1x |
| vDSP/Accelerate (ours) | ~390 µs | 5.6x |
| Stack | Total | Budget Used | Real-Time Headroom |
|---|---|---|---|
| ANE + vDSP (ours) | ~547 µs | ~4.7% | ~21x |
| CoreML + scalar | ~2,920 µs | ~25.2% | ~4x |
| CPU conv + scalar | ~5,300 µs | ~45.7% | ~2x |
Numbers from
ane-benchon M-series hardware. Your numbers will vary by chip generation.
┌─────────────────────────────────────────────────────────────────────┐
│ ane-synth pipeline │
│ │
│ MIDI / Keyboard │
│ │ │
│ ▼ │
│ ┌─────────────┐ Rust MIL codegen │
│ │ Voice Pool │ (no Python, no CoreML toolchain) │
│ │ 8 voices │ │
│ │ note, f0, │──────────────────────────────────────────────┐ │
│ │ velocity │ │ │
│ └─────────────┘ │ │
│ │ │
│ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ APPLE NEURAL ENGINE (separate chip) │ │ │
│ │ │ │ │
│ │ Input: [8, 4, 1, 8] FP16 (8 voices × f0/loud/vel/z) │◄─┘ │
│ │ │ │
│ │ HarmonicPredictor (4→64→128→64→66, pure FP16) │ │
│ │ Conv → ReLU → Conv → ReLU → Conv → ReLU → Conv │ │
│ │ │ │
│ │ Output: [8, 66, 1, 8] FP16 (64 harmonics + amp/noise) │ │
│ │ │ │
│ │ ONE dispatch. ALL voices. 157µs. ~2.8W. Zero CPU. │ │
│ └──────────────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌──────────────────────────────▼───────────────────────────┐ │
│ │ CPU: Accelerate / vDSP Additive Synthesis │ │
│ │ │ │
│ │ For each voice × harmonic k (up to 64, Nyquist-limited): │ │
│ │ vDSP_vsmul(phase_ramp, k) → harm_phases[512] │ │
│ │ vvsinf(harm_phases) → sin_buf[512] │ │
│ │ vDSP_vmul(sin_buf, amps) → weighted partials │ │
│ │ vDSP_vadd(voice_buf, ...) → accumulate │ │
│ │ │ │
│ │ 5.6x faster than scalar sin() loops │ │
│ └──────────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ CoreAudio output │
│ 44.1kHz, 512-sample buffer │
└─────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ CoreML "standard" path (what everyone else does) │
│ │
│ MLModel.predict() → XPC → modelcachingd → ANE scheduler → hardware │
│ │
│ ↳ ~720µs (4.6x slower startup, XPC round-trip on every eval) │
└──────────────────────────────────────────────────────────────────────┘
Requirements: Apple Silicon Mac, macOS 15+, Rust toolchain (rustup)
git clone https://github.com/your-org/ane-synth
cd ane-synth
# Run the benchmark (CoreML vs Direct ANE vs CPU)
cargo run --release --bin ane-bench
# Play the synthesizer (computer keyboard as piano, optional MIDI)
cargo run --release --bin ane-synthThe synth binary opens a TUI. Your keyboard maps to two chromatic octaves:
Upper row: Q 2 W 3 E R 5 T 6 Y 7 U → C4 to B4
Lower row: Z S X D C V G B H N J M → C3 to B3
Connect a MIDI keyboard and it will be detected automatically. Press ESC to quit.
# Generate the equivalent CoreML model for the benchmark
python3 bench/gen_coreml_model.py # requires coremltools
# Re-run benchmark — now shows all three paths
cargo run --release --bin ane-benchCoreML's MLModel API requires a .mlpackage on disk, compiled by coremltools in Python, then loaded over XPC. We skip all of that.
synth-model generates Machine Learning Intermediate Language (MIL) programs directly as UTF-8 strings in Rust at startup. No Python. No disk I/O. No toolchain dependency.
// Generate a batched FP16 HarmonicPredictor for 8 voices
let mil: String = synth_model::generate_mil(8, T_FRAMES);
// Compile it directly on the ANE via private APIs
let kernel = ane_bridge::AneKernel::compile(mil.as_bytes(), Some(&weights), ...);We dlopen AppleNeuralEngine.framework at runtime and call its private classes directly — the same ones CoreML uses internally, minus the XPC broker:
// bridge.m (Rust-callable via cc crate)
dlopen("/System/Library/PrivateFrameworks/AppleNeuralEngine.framework/...", RTLD_NOW);
id desc = [_ANEInMemoryModelDescriptor modelWithMILText:milData weights:weights optionsPlist:nil];
id model = [_ANEInMemoryModel inMemoryModelWithDescriptor:desc];
id request = [_ANERequest requestWithModel:model inputs:surfaces outputs:surfaces options:nil];
[aneEngine evaluateWithQoS:request completionHandler:^(NSError *e){ ... }];I/O is via IOSurfaces — shared memory regions that the ANE DMA-transfers directly, with no kernel copies.
The MIL program, weights, IOSurface buffers, and all intermediate activations are FP16. No FP32 casts anywhere in the neural path. The half crate handles the Rust side; the ANE handles the rest natively.
kernel.write_input_f16(0, &input_f16); // [8, 4, 1, 8] — 8 voices batched
kernel.eval(); // blocks until ANE finishes (~157µs)
kernel.read_output_f16(0, &mut output); // [8, 66, 1, 8] — 64 harmonics per voiceAll active voices are packed into a single [B=8, C=4, 1, T=8] FP16 tensor and evaluated in one ANE call. The ANE's matrix engine handles all voices in parallel. Dispatching 8 separate single-voice kernels would be ~8x slower due to dispatch overhead.
The ANE outputs 64 harmonic amplitudes per voice per temporal frame. The CPU renders audio using Apple's Accelerate framework:
vDSP_vsmul— scale the phase ramp by each harmonic numbervvsinf— vectorized sin over the entire 512-sample buffer at oncevDSP_vmul+vDSP_vadd— weight and accumulate partials
Result: ~50 vvsinf(512) calls replace 512 × 50 scalar sin() calls. 5.6x faster. NEON-backed.
| Tool | Platform | Neural Inference | Real-Time? | ANE? | Python Free? |
|---|---|---|---|---|---|
| ane-synth | macOS | Direct ANE dispatch | Yes | Yes (direct) | Yes |
| Neutone | macOS/Win | PyTorch + CoreML | Yes | Via CoreML | No |
| RAVE | macOS/Win/Lin | PyTorch | Sometimes | No | No |
| nn~ | macOS/Win/Lin | LibTorch | Yes | No | No |
| NAM | macOS/Win/Lin | RTNeural/LibTorch | Yes | No | No |
| DDSP-VST | macOS/Win | TensorFlow | Partial | No | No |
Every tool in the table uses the CPU for neural inference. All of them compete with the audio thread for CPU time. None of them use the ANE's dedicated compute fabric.
The fundamental issue: all existing approaches treat the ANE as an optimization for the CoreML stack. We treat it as a first-class accelerator with a direct I/O path.
ane-bridge is a standalone crate — a safe Rust wrapper over ANE private APIs. Anything that can be expressed as a MIL program can run on the ANE directly. Some starting points:
- VST/AU plugin — wrap
synth-enginewith a plugin SDK (vst3-sys, nih-plug). The ANE runs independently of the audio thread so there are no thread-safety issues. - iOS instruments — the same private APIs exist on iOS. Core Audio + ANE + Swift UI = a native instrument with the battery life of a 2010 calculator.
- Game audio — procedural instrument synthesis for dynamic music, zero CPU budget. The ANE is idle during gameplay.
- Effect processing — noise reduction, reverb diffusion networks, de-essing models. Any small convolutional network runs fast enough to process a buffer in under 200µs.
- SDK layer — build a CoreML-free model runtime that accepts
.mlpackageweights and dispatches them directly. The MIL text is already in the package. - Research testbed — direct ANE access means you control scheduling, batch size, and data formats. Measure real ANE throughput without CoreML overhead contaminating your numbers.
ane-synth/
├── crates/
│ ├── ane-bridge/ Safe Rust wrapper over ANE private APIs
│ │ ├── src/lib.rs AneKernel: compile/write/eval/read
│ │ └── objc/bridge.m Obj-C bridge: dlopen + _ANEInMemoryModel
│ │
│ ├── synth-model/ DDSP HarmonicPredictor as MIL codegen
│ │ └── src/lib.rs generate_mil(), build_weight_blob(), compile_*()
│ │
│ ├── synth-engine/ Audio engine: MIDI → ANE → vDSP → CoreAudio
│ │ └── src/lib.rs SynthEngine, SynthState, Voice, batched render
│ │
│ └── ane-synth-app/ Binaries
│ └── src/
│ ├── main.rs TUI synthesizer (ratatui + computer keyboard/MIDI)
│ └── bench.rs Benchmark: CoreML vs Direct ANE vs CPU
│
├── bench/
│ ├── gen_coreml_model.py Generate equivalent CoreML model for comparison
│ ├── bench_coreml.m Obj-C CoreML benchmark harness
│ └── coreml_vs_ane.m Side-by-side comparison prototype
│
├── docs/ Landing page
└── scripts/
└── build-release.sh Universal binary + notarization
- Apple Silicon Mac — M1 or later (ANE not available on Intel)
- macOS 15+ (Sequoia) — MIL
program(1.3)andios18function signatures - Rust — stable toolchain via
rustup
No Python. No PyTorch. No CoreML toolchain. No Xcode (the cc crate invokes clang from Command Line Tools). Single cargo build --release.
This project builds directly on the groundbreaking ANE reverse-engineering work by the open-source community:
-
maderix/ANE (775+ stars) — the foundational reverse-engineering effort. maderix's comprehensive blog series on M4 ANE internals and the header
ane_runtime.hexposed the_ANEInMemoryModel,_ANEClient, MIL format, IOSurface I/O patterns, and weight blob format that ourane-bridgeis built directly upon. Without this work, direct ANE dispatch would remain a black box. -
eiln/ane — deep analysis of the ANE microarchitecture, tile formats, and scheduling behavior on M-series hardware, crucial for understanding performance characteristics.
-
atomicbird/CoreMLSpy — tracing CoreML's internal XPC protocol and ANE dispatch path, providing the blueprint for our in-process dispatch strategy.
-
Apple DTS / CoreML team — for shipping MIL (Machine Learning Intermediate Language) as a documented (if private) IR.
-
The ANE reverse-engineering community — a growing ecosystem of researchers and developers working to unlock direct access to Apple's neural accelerators.
The ane-bridge crate is a safe, idiomatic Rust wrapper over these discoveries, enabling direct ANE dispatch without CoreML's XPC overhead. This project uses only public Rust crates and documented OS frameworks (Foundation, IOSurface, CoreAudio). The ANE private API surface (AppleNeuralEngine.framework) is accessed via dlopen at runtime — the same mechanism CoreML itself uses.
MIT License
Copyright (c) 2026
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.