ANE  by maderix

Direct neural network training on Apple Neural Engine

Created 1 month ago
6,628 stars

Top 7.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This research project demonstrates training neural networks directly on Apple's Neural Engine (ANE) by reverse-engineering private APIs. It targets researchers and engineers exploring on-device AI compute beyond standard inference, offering a proof-of-concept and performance benchmarks for direct ANE access.

How It Works The core approach uses reverse-engineered _ANEClient and _ANECompiler private APIs to compile custom Model Intermediate Language (MIL) compute graphs directly on the ANE. Objective-C generates MIL at runtime, compiled in-memory. Data and weights are handled via IOSurface in a channel-first layout. Forward and backward passes run on ANE kernels, with weight gradients (dW) computed on CPU via cblas. This bypasses Apple's inference-only restriction.

Quick Start & Requirements Build with xcrun clang -O2 -framework Foundation -framework IOSurface -framework CoreML -framework Accelerate -ldl -lobjc -o train_large training/train_large.m and run ./train_large. Requires macOS 15+ on Apple Silicon (M4 tested). Uses only system frameworks and private ANE APIs resolved at runtime.

Highlighted Details

  • Achieves 9.3 ms/step on M4 for a single transformer layer (dim=768, seq=512), with 11.2% ANE utilization (1.78 TFLOPS sustained).
  • Employs 6 ANE kernels per step, with CPU handling
Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
145 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
23 more.

Megatron-LM by NVIDIA

0.3%
16k
Framework for training transformer models at scale
Created 7 years ago
Updated 6 hours ago
Feedback? Help us improve.