Anemll  by Anemll

Framework for porting LLMs to Apple Neural Engine (ANE)

created 6 months ago
1,110 stars

Top 35.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

ANEMLL is an open-source library designed to accelerate the porting and on-device inference of Large Language Models (LLMs) on Apple's Neural Engine (ANE). It targets developers building low-power, privacy-focused applications for edge devices, enabling seamless integration of LLMs into iOS and macOS applications.

How It Works

ANEMLL provides a pipeline for converting Hugging Face models to Apple's CoreML format, optimized for ANE execution. It leverages CoreML Tools for conversion and offers Swift and Python implementations for inference. This approach allows for direct on-device processing, enhancing privacy and reducing reliance on cloud infrastructure.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies:
    git clone https://github.com/Anemll/Anemll.git
    cd Anemll
    python -m venv anemll-env
    source anemll-env/bin/activate
    pip install -r requirements.txt
    
  • Prerequisites: macOS Sequoia with Apple Neural Engine, minimum 16GB RAM, Python 3.9+, Xcode Command Line Tools (for coremlcompiler).
  • Conversion:
    ./anemll/utils/convert_model.sh --model <path_to_model> --output <output_directory>
    
  • Inference (Python):
    python ./tests/chat.py --meta <output_directory>/meta.yaml
    
  • Resources: Official Models: huggingface.co/anemll, Guides: README, Swift CLI Guide

Highlighted Details

  • Supports LLaMA 3.1 architecture, including DeepSeek and DeepHermes distilled models (LLAMA 3.1 1B and 8B variants).
  • Provides sample converted models and ready-to-use iOS/macOS applications (SwiftUI chat interface).
  • Includes benchmarking tools (ANEMLL-BENCH) for performance testing and model optimization metrics.
  • Offers both basic (chat.py) and advanced (chat_full.py) Python chat interfaces with conversation history management.

Maintenance & Community

  • Alpha Release 0.3.0.
  • Active development with updates to conversion scripts and sample applications.
  • Community engagement encouraged via GitHub issues and pull requests.
  • X (Twitter): @anemll

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

  • Currently in Alpha (0.3.0), with quantization quality noted as needing improvement, particularly for LUT4 models.
  • Initial release focuses on LLaMA 3.1 architecture; broader model support is planned.
Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
4
Star History
371 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.