pmetal by Epistates

High-performance LLM framework for Apple Silicon

Created 3 months ago

255 stars

Top 98.8% on SourcePulse

Project Summary

PMetal is a comprehensive machine learning SDK, framework, and application suite engineered specifically for Apple Silicon. It empowers developers and researchers to perform LLM fine-tuning, inference, and model operations entirely within the Apple ecosystem, offering significant performance gains through native Metal GPU kernel and Apple Neural Engine (ANE) integration.

How It Works

Written in Rust, PMetal leverages low-level Metal GPU kernels and direct ANE integration to achieve high performance and efficiency on Apple Silicon. It provides a unified platform from custom kernel development to high-level training APIs, accessible via Rust and Python SDKs, a terminal TUI, and a full desktop GUI. This approach allows for optimized ML workflows without requiring external cloud infrastructure or non-Apple hardware.

Quick Start & Requirements

Prebuilt signed binaries are available on the project's Releases page. Alternatively, build from source using cargo build --release. The optional desktop GUI requires bun install and bun tauri build within crates/pmetal-gui. A primary requirement is Apple Silicon hardware (M1-M5 chip families).

Highlighted Details

Metal GPU Optimizations: Features custom Metal shaders for significant speedups in operations like FlashAttention, RoPE, SwiGLU, and LoRA, with tier-based kernel tuning.
ANE Integration: Offers native ANE pipeline support for power-efficient training and inference, including dynamic weight pipelines and hybrid CPU-ANE execution.
Comprehensive Model Operations: Supports extensive model merging (16 strategies), GGUF quantization (13 formats), and LoRA adapter fusing.
Advanced Training Infrastructure: Includes sequence packing for throughput gains, gradient checkpointing, adaptive learning rates, a robust callback system, and native tool/function calling support.
Broad Model & Data Support: Handles numerous LLM architectures for inference and LoRA/QLoRA training, with auto-detection for various dataset formats (ShareGPT, Alpaca, JSONL, Parquet, etc.).
User Interfaces: Provides both a feature-rich terminal TUI and a Tauri-based desktop GUI for visual model management and training.

Maintenance & Community

No specific details regarding maintainers, sponsorships, or community channels (like Discord/Slack) are present in the provided README.

Licensing & Compatibility

Licensed under either MIT or Apache-2.0, PMetal's permissive licenses allow for commercial use and integration into closed-source projects.

Limitations & Caveats

LoRA training integration is not yet available for all listed architectures, including Llama 4, Qwen 3 MoE, DeepSeek, Cohere, Granite, NemotronH, Phi 4, StarCoder2, RecurrentGemma, and Jamba. Additionally, several architecture implementations reside in pmetal-models but are not yet integrated into the main DynamicModel dispatcher for CLI or SDK use. Features like distributed training and the vocoder require explicit feature flag enablement.

pmetal by Epistates

Explore Similar Projects

KVSplit by dipampaul17

zinc by zolotukhin

SwiftLM by SharpAI

edge by cartesia-ai

Crane by lucasjinreal

Nanoflow by efeslab

rvllm by m0at

marlin by IST-DASLab

vllm-metal by vllm-project

xTuring by stochasticai

intel-extension-for-pytorch by intel

candle by huggingface