ane-synth

The first real-time neural synthesizer running directly on Apple Neural Engine — bypassing CoreML entirely.

157 microseconds per audio buffer. 79x real-time headroom. Zero CPU cores consumed during inference. 8-voice polyphony from a single batched ANE dispatch.

Why This Matters

Built on the ANE reverse-engineering work by maderix/ANE.

Your Apple Silicon Mac contains a dedicated 19 TOPS neural engine — a separate chip, separate memory bus, separate power domain. It runs while your CPU sleeps. And almost nobody is using it for audio.

Every neural audio tool in existence — Neutone, RAVE, nn~, NAM, DDSP-VST — routes inference through Python, PyTorch, or CoreML's XPC broker. Each layer adds overhead. None of them touch the ANE directly.

ane-synth does.

MIDI → [ANE: 157µs, 0 CPU, ~2.8W] → [Accelerate/vDSP: vectorized additive synth] → CoreAudio

The ANE finishes processing all 8 voices before your audio thread has spent a single microsecond on synthesis. The CPU never touches neural inference. It just does math with NEON vectors.

Benchmark Results

Measured on Apple M-series (44.1kHz, 512-sample buffer, 11.6ms deadline):

Inference Latency — 8 Voices, One Buffer

Path	Latency	Budget Used	CPU Impact
Direct ANE (ours)	157 µs	1.36%	0 cores
CoreML `MLModel.predict()`	~720 µs	~6.2%	0 cores
CPU scalar conv (FP32)	~3,100 µs	~26.7%	steals cores

Startup Time

Path	Time	Notes
Direct ANE (ours)	~160 ms	MIL compiled in-process, no XPC
CoreML standard	~740 ms	`.mlpackage` compile + `MLModel` load over XPC
CPU	0 ms	No compilation step

Additive Synthesis — 64 Harmonics × 512 Samples

Path	Latency	Speedup
Scalar `sin()` loop	~2,200 µs	1x
vDSP/Accelerate (ours)	~390 µs	5.6x

Full Pipeline — Inference + Synthesis

Stack	Total	Budget Used	Real-Time Headroom
ANE + vDSP (ours)	~547 µs	~4.7%	~21x
CoreML + scalar	~2,920 µs	~25.2%	~4x
CPU conv + scalar	~5,300 µs	~45.7%	~2x

Numbers from ane-bench on M-series hardware. Your numbers will vary by chip generation.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         ane-synth pipeline                          │
│                                                                     │
│  MIDI / Keyboard                                                    │
│       │                                                             │
│       ▼                                                             │
│  ┌─────────────┐    Rust MIL codegen                               │
│  │  Voice Pool │    (no Python, no CoreML toolchain)               │
│  │  8 voices   │                                                    │
│  │  note, f0,  │──────────────────────────────────────────────┐   │
│  │  velocity   │                                               │   │
│  └─────────────┘                                               │   │
│                                                                │   │
│  ┌─────────────────────────────────────────────────────────┐  │   │
│  │              APPLE NEURAL ENGINE (separate chip)         │  │   │
│  │                                                          │  │   │
│  │  Input:  [8, 4, 1, 8] FP16  (8 voices × f0/loud/vel/z) │◄─┘   │
│  │                                                          │       │
│  │  HarmonicPredictor (4→64→128→64→66, pure FP16)          │       │
│  │  Conv → ReLU → Conv → ReLU → Conv → ReLU → Conv         │       │
│  │                                                          │       │
│  │  Output: [8, 66, 1, 8] FP16 (64 harmonics + amp/noise) │       │
│  │                                                          │       │
│  │  ONE dispatch. ALL voices. 157µs. ~2.8W. Zero CPU.      │       │
│  └──────────────────────────────┬───────────────────────────┘       │
│                                 │                                   │
│  ┌──────────────────────────────▼───────────────────────────┐       │
│  │          CPU: Accelerate / vDSP Additive Synthesis        │       │
│  │                                                           │       │
│  │  For each voice × harmonic k (up to 64, Nyquist-limited): │      │
│  │    vDSP_vsmul(phase_ramp, k)  → harm_phases[512]         │       │
│  │    vvsinf(harm_phases)        → sin_buf[512]              │       │
│  │    vDSP_vmul(sin_buf, amps)   → weighted partials         │       │
│  │    vDSP_vadd(voice_buf, ...)  → accumulate                │       │
│  │                                                           │       │
│  │  5.6x faster than scalar sin() loops                     │       │
│  └──────────────────────────────┬───────────────────────────┘       │
│                                 │                                   │
│                                 ▼                                   │
│                          CoreAudio output                           │
│                       44.1kHz, 512-sample buffer                    │
└─────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│  CoreML "standard" path (what everyone else does)                    │
│                                                                      │
│  MLModel.predict() → XPC → modelcachingd → ANE scheduler → hardware │
│                                                                      │
│  ↳ ~720µs  (4.6x slower startup, XPC round-trip on every eval)     │
└──────────────────────────────────────────────────────────────────────┘

Quick Start

Requirements: Apple Silicon Mac, macOS 15+, Rust toolchain (rustup)

git clone https://github.com/your-org/ane-synth
cd ane-synth

# Run the benchmark (CoreML vs Direct ANE vs CPU)
cargo run --release --bin ane-bench

# Play the synthesizer (computer keyboard as piano, optional MIDI)
cargo run --release --bin ane-synth

The synth binary opens a TUI. Your keyboard maps to two chromatic octaves:

Upper row:  Q 2 W 3 E R 5 T 6 Y 7 U  →  C4 to B4
Lower row:  Z S X D C V G B H N J M  →  C3 to B3

Connect a MIDI keyboard and it will be detected automatically. Press ESC to quit.

Optional: CoreML comparison

# Generate the equivalent CoreML model for the benchmark
python3 bench/gen_coreml_model.py   # requires coremltools

# Re-run benchmark — now shows all three paths
cargo run --release --bin ane-bench

How It Works

1. MIL Programs Generated in Rust

CoreML's MLModel API requires a .mlpackage on disk, compiled by coremltools in Python, then loaded over XPC. We skip all of that.

synth-model generates Machine Learning Intermediate Language (MIL) programs directly as UTF-8 strings in Rust at startup. No Python. No disk I/O. No toolchain dependency.

// Generate a batched FP16 HarmonicPredictor for 8 voices
let mil: String = synth_model::generate_mil(8, T_FRAMES);
// Compile it directly on the ANE via private APIs
let kernel = ane_bridge::AneKernel::compile(mil.as_bytes(), Some(&weights), ...);

2. Direct ANE Dispatch via Private Objective-C APIs

We dlopen AppleNeuralEngine.framework at runtime and call its private classes directly — the same ones CoreML uses internally, minus the XPC broker:

// bridge.m (Rust-callable via cc crate)
dlopen("/System/Library/PrivateFrameworks/AppleNeuralEngine.framework/...", RTLD_NOW);

id desc = [_ANEInMemoryModelDescriptor modelWithMILText:milData weights:weights optionsPlist:nil];
id model = [_ANEInMemoryModel inMemoryModelWithDescriptor:desc];
id request = [_ANERequest requestWithModel:model inputs:surfaces outputs:surfaces options:nil];
[aneEngine evaluateWithQoS:request completionHandler:^(NSError *e){ ... }];

I/O is via IOSurfaces — shared memory regions that the ANE DMA-transfers directly, with no kernel copies.

3. Pure FP16 Throughout

The MIL program, weights, IOSurface buffers, and all intermediate activations are FP16. No FP32 casts anywhere in the neural path. The half crate handles the Rust side; the ANE handles the rest natively.

kernel.write_input_f16(0, &input_f16);  // [8, 4, 1, 8] — 8 voices batched
kernel.eval();                           // blocks until ANE finishes (~157µs)
kernel.read_output_f16(0, &mut output); // [8, 66, 1, 8] — 64 harmonics per voice

4. Batched 8-Voice Inference — One ANE Dispatch

All active voices are packed into a single [B=8, C=4, 1, T=8] FP16 tensor and evaluated in one ANE call. The ANE's matrix engine handles all voices in parallel. Dispatching 8 separate single-voice kernels would be ~8x slower due to dispatch overhead.

5. DDSP-Style Additive Synthesis via Accelerate

The ANE outputs 64 harmonic amplitudes per voice per temporal frame. The CPU renders audio using Apple's Accelerate framework:

vDSP_vsmul — scale the phase ramp by each harmonic number
vvsinf — vectorized sin over the entire 512-sample buffer at once
vDSP_vmul + vDSP_vadd — weight and accumulate partials

Result: ~50 vvsinf(512) calls replace 512 × 50 scalar sin() calls. 5.6x faster. NEON-backed.

Comparison with Existing Tools

Tool	Platform	Neural Inference	Real-Time?	ANE?	Python Free?
ane-synth	macOS	Direct ANE dispatch	Yes	Yes (direct)	Yes
Neutone	macOS/Win	PyTorch + CoreML	Yes	Via CoreML	No
RAVE	macOS/Win/Lin	PyTorch	Sometimes	No	No
nn~	macOS/Win/Lin	LibTorch	Yes	No	No
NAM	macOS/Win/Lin	RTNeural/LibTorch	Yes	No	No
DDSP-VST	macOS/Win	TensorFlow	Partial	No	No

Every tool in the table uses the CPU for neural inference. All of them compete with the audio thread for CPU time. None of them use the ANE's dedicated compute fabric.

The fundamental issue: all existing approaches treat the ANE as an optimization for the CoreML stack. We treat it as a first-class accelerator with a direct I/O path.

What You Can Build With This

ane-bridge is a standalone crate — a safe Rust wrapper over ANE private APIs. Anything that can be expressed as a MIL program can run on the ANE directly. Some starting points:

VST/AU plugin — wrap synth-engine with a plugin SDK (vst3-sys, nih-plug). The ANE runs independently of the audio thread so there are no thread-safety issues.
iOS instruments — the same private APIs exist on iOS. Core Audio + ANE + Swift UI = a native instrument with the battery life of a 2010 calculator.
Game audio — procedural instrument synthesis for dynamic music, zero CPU budget. The ANE is idle during gameplay.
Effect processing — noise reduction, reverb diffusion networks, de-essing models. Any small convolutional network runs fast enough to process a buffer in under 200µs.
SDK layer — build a CoreML-free model runtime that accepts .mlpackage weights and dispatches them directly. The MIL text is already in the package.
Research testbed — direct ANE access means you control scheduling, batch size, and data formats. Measure real ANE throughput without CoreML overhead contaminating your numbers.

Project Structure

ane-synth/
├── crates/
│   ├── ane-bridge/          Safe Rust wrapper over ANE private APIs
│   │   ├── src/lib.rs       AneKernel: compile/write/eval/read
│   │   └── objc/bridge.m    Obj-C bridge: dlopen + _ANEInMemoryModel
│   │
│   ├── synth-model/         DDSP HarmonicPredictor as MIL codegen
│   │   └── src/lib.rs       generate_mil(), build_weight_blob(), compile_*()
│   │
│   ├── synth-engine/        Audio engine: MIDI → ANE → vDSP → CoreAudio
│   │   └── src/lib.rs       SynthEngine, SynthState, Voice, batched render
│   │
│   └── ane-synth-app/       Binaries
│       └── src/
│           ├── main.rs      TUI synthesizer (ratatui + computer keyboard/MIDI)
│           └── bench.rs     Benchmark: CoreML vs Direct ANE vs CPU
│
├── bench/
│   ├── gen_coreml_model.py  Generate equivalent CoreML model for comparison
│   ├── bench_coreml.m       Obj-C CoreML benchmark harness
│   └── coreml_vs_ane.m      Side-by-side comparison prototype
│
├── docs/                    Landing page
└── scripts/
    └── build-release.sh     Universal binary + notarization

Requirements

Apple Silicon Mac — M1 or later (ANE not available on Intel)
macOS 15+ (Sequoia) — MIL program(1.3) and ios18 function signatures
Rust — stable toolchain via rustup

No Python. No PyTorch. No CoreML toolchain. No Xcode (the cc crate invokes clang from Command Line Tools). Single cargo build --release.

Acknowledgments

This project builds directly on the groundbreaking ANE reverse-engineering work by the open-source community:

maderix/ANE (775+ stars) — the foundational reverse-engineering effort. maderix's comprehensive blog series on M4 ANE internals and the header ane_runtime.h exposed the _ANEInMemoryModel, _ANEClient, MIL format, IOSurface I/O patterns, and weight blob format that our ane-bridge is built directly upon. Without this work, direct ANE dispatch would remain a black box.
eiln/ane — deep analysis of the ANE microarchitecture, tile formats, and scheduling behavior on M-series hardware, crucial for understanding performance characteristics.
atomicbird/CoreMLSpy — tracing CoreML's internal XPC protocol and ANE dispatch path, providing the blueprint for our in-process dispatch strategy.
Apple DTS / CoreML team — for shipping MIL (Machine Learning Intermediate Language) as a documented (if private) IR.
The ANE reverse-engineering community — a growing ecosystem of researchers and developers working to unlock direct access to Apple's neural accelerators.

The ane-bridge crate is a safe, idiomatic Rust wrapper over these discoveries, enabling direct ANE dispatch without CoreML's XPC overhead. This project uses only public Rust crates and documented OS frameworks (Foundation, IOSurface, CoreAudio). The ANE private API surface (AppleNeuralEngine.framework) is accessed via dlopen at runtime — the same mechanism CoreML itself uses.

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bench		bench
crates		crates
docs		docs
scripts		scripts
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ane-synth

Why This Matters

Benchmark Results

Inference Latency — 8 Voices, One Buffer

Startup Time

Additive Synthesis — 64 Harmonics × 512 Samples

Full Pipeline — Inference + Synthesis

Architecture

Quick Start

Optional: CoreML comparison

How It Works

1. MIL Programs Generated in Rust

2. Direct ANE Dispatch via Private Objective-C APIs

3. Pure FP16 Throughout

4. Batched 8-Voice Inference — One ANE Dispatch

5. DDSP-Style Additive Synthesis via Accelerate

Comparison with Existing Tools

What You Can Build With This

Project Structure

Requirements

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

ane-synth

Why This Matters

Benchmark Results

Inference Latency — 8 Voices, One Buffer

Startup Time

Additive Synthesis — 64 Harmonics × 512 Samples

Full Pipeline — Inference + Synthesis

Architecture

Quick Start

Optional: CoreML comparison

How It Works

1. MIL Programs Generated in Rust

2. Direct ANE Dispatch via Private Objective-C APIs

3. Pure FP16 Throughout

4. Batched 8-Voice Inference — One ANE Dispatch

5. DDSP-Style Additive Synthesis via Accelerate

Comparison with Existing Tools

What You Can Build With This

Project Structure

Requirements

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages