Defending the Apple Neural Engine (ANE)

In a discussion about Apple Research’s open-source MLX machine learning framework on Hacker News yesterday, a comment proclaims-

ANE is probably the biggest scam “feature” Apple has ever sold.

Similar rhetoric is common there. Loads of people seem to infer that because MLX doesn’t use ANE, therefore ANE is useless.

As some background, Apple added the ANE subsystem in 2017 with the iPhone X. This is circuitry dedicated to running quantized, forward-pass neural networks for inference. Apple’s intention with the circuitry was to enable enhanced OS functionality, and indeed the immediate use of the chip was to enable Face ID. That first implementation offered up some 0.6 TOPS (trillion operations per second), and was built with power efficiency as a driving requirement.

It was the foundation for enabling the platform to start building out NN-based functionality to power features and OS functions. And notably many other chip makers started adding similar neural engines into their chips for the same purpose: Low power but performant-enough NN operations for system features. You aren’t going to run ChatGPT on it, but it stills hold loads of utility for the platform.

The next year they released a variant with 5 TOPS, an 8x speed improvement. Then 6, 11, 16, 17, and then 35 TOPS (though that last jump is likely just switching the measure from FP16 to INT8). In all cases the ANE is limited to specific types of models with specific architectures and limitations. It was never intended to power NN training tasks, massive models, and so on.

And the system heavily uses the ANE now. Every bit of text and subject-matter is extracted from images, both in your freshly-taken photos and even just browsing the web, courtesy of the ANE (maybe people don’t even realize this. Yes, you can search your photo library for a random snippet of text, even heavily distorted text. And you can highlight and copy text off of images on random websites in Safari on Apple Silicon, at virtually zero power cost. ANE). After you’ve triggered Siri with hey siri, voice processing and TTS is handled by the ANE. Some of the rather useless genAI stuff is powered by the ANE. Computational photography, and even just things like subject detection for choosing what to focus on, is powered by the ANE hardware.

All of this happens with a negligible impact on battery life and without impact or impeding the CPU or GPU cores in performing other tasks.

It’s pretty clear that Apple fully intended the ANE as hardware for the OS to use, and third party apps just weren’t a concern, nor did they make it a part of their messaging. In 2018 they did enable CoreML to leverage the ANE for some very limited cases, and even then the OS throttles the capacity you can use to ensure that the OS has plenty of capacity when it demands it.

So why doesn’t MLX use ANE at all? I mean, the authors very specifically stated why. The only public way of using the ANE subsystem is by creating and running models through CoreML, which is entirely orthogonal to the purpose and mandate of MLX. Obviously Apple Research could just reach into the innards and use it if they wanted, but MLX is an open-source project so that simply isn’t viable.

Apple added some tensor cores to the GPU in their most recent chips (M5 and A19 Pro), calling them “neural accelerators”. These are fantastic for training and complex models (including BF16), at the cost of magnitudes more power usage. It also gives Apple a path to start massively scaling up their general-purpose AI bona fides, adding more and more NA cores per GPU core, and GPU cores per device — copy/paste scaling — especially on the desktop path where they can achieve enormous levels of performance where power isn’t as much of a concern and active cooling is available.

In no universe is Apple going to move existing OS NNs to these new tensor cores. Their purposes and driving philosophies are very different, and they serve different roles and purposes.

Nor is Apple abandoning CoreML. Apple Research put out MLX to rightfully try to get some of the attention of the Pytorch et al. community, and it has been wildly successful, but in no way does it supplant or replace CoreML, though again that incredibly weird claim constantly recurs from the peanut gallery. If you have a consumer app for Apple devices and you run NNs for inference to enable features, odds are overwhelming that your best bet is CoreML (which will use the GPU, GPU NA, ANE, and CPU as appropriate and available).

People like to all-or-nothing stuff like this, thinking it’s all losers and winners and everything is binary. It’s reminiscent of Google unveiling Fuchsia, where every tech board like HN was full of prognosticators declaring that the day of Linux, ChromeOS, Android, and so on was over.

Years later and Fuchsia powers a Nest device, and is largely a dead project. So…maybe not?