Anemll by Anemll

Framework for porting LLMs to Apple Neural Engine (ANE)

Created 1 year ago

1,292 stars

Top 30.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Alex Chen

Cofounder of Nexa AI

Luis Capelo

Cofounder of Lightning AI

Alex Cheema

Cofounder of EXO Labs

Project Summary

ANEMLL is an open-source library designed to accelerate the porting and on-device inference of Large Language Models (LLMs) on Apple's Neural Engine (ANE). It targets developers building low-power, privacy-focused applications for edge devices, enabling seamless integration of LLMs into iOS and macOS applications.

How It Works

ANEMLL provides a pipeline for converting Hugging Face models to Apple's CoreML format, optimized for ANE execution. It leverages CoreML Tools for conversion and offers Swift and Python implementations for inference. This approach allows for direct on-device processing, enhancing privacy and reducing reliance on cloud infrastructure.

Quick Start & Requirements

Install: Clone the repository and install dependencies:

git clone https://github.com/Anemll/Anemll.git
cd Anemll
python -m venv anemll-env
source anemll-env/bin/activate
pip install -r requirements.txt

Prerequisites: macOS Sequoia with Apple Neural Engine, minimum 16GB RAM, Python 3.9+, Xcode Command Line Tools (for coremlcompiler).

Conversion:

./anemll/utils/convert_model.sh --model <path_to_model> --output <output_directory>

Inference (Python):

python ./tests/chat.py --meta <output_directory>/meta.yaml

Resources: Official Models: huggingface.co/anemll, Guides: README, Swift CLI Guide

Highlighted Details

Supports LLaMA 3.1 architecture, including DeepSeek and DeepHermes distilled models (LLAMA 3.1 1B and 8B variants).
Provides sample converted models and ready-to-use iOS/macOS applications (SwiftUI chat interface).
Includes benchmarking tools (ANEMLL-BENCH) for performance testing and model optimization metrics.
Offers both basic (chat.py) and advanced (chat_full.py) Python chat interfaces with conversation history management.

Maintenance & Community

Alpha Release 0.3.0.
Active development with updates to conversion scripts and sample applications.
Community engagement encouraged via GitHub issues and pull requests.
X (Twitter): @anemll

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

Currently in Alpha (0.3.0), with quantization quality noted as needing improvement, particularly for LUT4 models.
Initial release focuses on LLaMA 3.1 architecture; broader model support is planned.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days