ANE by maderix

Direct neural network training on Apple Neural Engine

Created 3 months ago

6,704 stars

Top 7.5% on SourcePulse

3 Experts Love This Project

winglian

Founder of Axolotl AI

syrusakbary

Founder of Wasmer

victortaelin

Author of Bend, Kind, HVM

Project Summary

Summary This research project demonstrates training neural networks directly on Apple's Neural Engine (ANE) by reverse-engineering private APIs. It targets researchers and engineers exploring on-device AI compute beyond standard inference, offering a proof-of-concept and performance benchmarks for direct ANE access.

How It Works The core approach uses reverse-engineered _ANEClient and _ANECompiler private APIs to compile custom Model Intermediate Language (MIL) compute graphs directly on the ANE. Objective-C generates MIL at runtime, compiled in-memory. Data and weights are handled via IOSurface in a channel-first layout. Forward and backward passes run on ANE kernels, with weight gradients (dW) computed on CPU via cblas. This bypasses Apple's inference-only restriction.

Quick Start & Requirements Build with xcrun clang -O2 -framework Foundation -framework IOSurface -framework CoreML -framework Accelerate -ldl -lobjc -o train_large training/train_large.m and run ./train_large. Requires macOS 15+ on Apple Silicon (M4 tested). Uses only system frameworks and private ANE APIs resolved at runtime.

Highlighted Details

Achieves 9.3 ms/step on M4 for a single transformer layer (dim=768, seq=512), with 11.2% ANE utilization (1.78 TFLOPS sustained).
Employs 6 ANE kernels per step, with CPU handling

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

1

Issues (30d)

0

Star History

71 stars in the last 30 days

Explore Similar Projects

AKO4ALL by TongmingLAIC

Agentic kernel optimization for any hardware

Created 2 months ago

Updated 1 week ago

MagiCompiler by SandAI-org

Accelerating large AI models with plug-and-play compilation

Created 6 months ago

Updated 1 week ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect).

magnetron by MarioSieg

Minimalist PyTorch alternative for research/production

Created 1 year ago

Updated 1 day ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

onnxruntime-training-examples by microsoft

ORTModule examples for accelerated training of transformer models

Created 6 years ago

Updated 3 weeks ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

7 more.

grokking-pytorch by Kaixhin

PyTorch guide with notes on usage, best practices, and debugging

Created 8 years ago

Updated 4 years ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

BMTrain by OpenBMB

Training toolkit for large AI models

Created 4 years ago

Updated 1 month ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 2 years ago

Updated 4 days ago

Starred by

Amanpreet Singh

Amanpreet Singh(Cofounder of Contextual AI),

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect), and

2 more.

Megatron-DeepSpeed by bigscience-workshop

Transformer LM research repo for BERT & GPT-2 training at scale

Created 5 years ago

Updated 2 years ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs), and

13 more.

optimum by huggingface

Hardware optimization tools for Transformers, Diffusers, etc

Created 4 years ago

Updated 3 days ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

3 more.

mindspore by mindspore-ai

Deep learning framework for mobile, edge, and cloud training/inference

Created 6 years ago

Updated 1 year ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Chaoyu Yang

Chaoyu Yang(Founder of Bento), and

2 more.

oneflow by Oneflow-Inc

Deep learning framework for user-friendly, scalable, efficient model development

Created 9 years ago

Updated 6 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI), and

23 more.

Megatron-LM by NVIDIA

Framework for training transformer models at scale

Created 7 years ago

Updated 14 hours ago

Feedback? Help us improve.