ANE  by maderix

Direct neural network training on Apple Neural Engine

Created 1 week ago

New!

6,128 stars

Top 8.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This research project demonstrates training neural networks directly on Apple's Neural Engine (ANE) by reverse-engineering private APIs. It targets researchers and engineers exploring on-device AI compute beyond standard inference, offering a proof-of-concept and performance benchmarks for direct ANE access.

How It Works The core approach uses reverse-engineered _ANEClient and _ANECompiler private APIs to compile custom Model Intermediate Language (MIL) compute graphs directly on the ANE. Objective-C generates MIL at runtime, compiled in-memory. Data and weights are handled via IOSurface in a channel-first layout. Forward and backward passes run on ANE kernels, with weight gradients (dW) computed on CPU via cblas. This bypasses Apple's inference-only restriction.

Quick Start & Requirements Build with xcrun clang -O2 -framework Foundation -framework IOSurface -framework CoreML -framework Accelerate -ldl -lobjc -o train_large training/train_large.m and run ./train_large. Requires macOS 15+ on Apple Silicon (M4 tested). Uses only system frameworks and private ANE APIs resolved at runtime.

Highlighted Details

  • Achieves 9.3 ms/step on M4 for a single transformer layer (dim=768, seq=512), with 11.2% ANE utilization (1.78 TFLOPS sustained).
  • Employs 6 ANE kernels per step, with CPU handling
Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
36
Issues (30d)
12
Star History
6,181 stars in the last 13 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
23 more.

Megatron-LM by NVIDIA

0.6%
16k
Framework for training transformer models at scale
Created 7 years ago
Updated 15 hours ago
Feedback? Help us improve.