pmetal  by Epistates

High-performance LLM framework for Apple Silicon

Created 3 months ago
255 stars

Top 98.8% on SourcePulse

GitHubView on GitHub
Project Summary

PMetal is a comprehensive machine learning SDK, framework, and application suite engineered specifically for Apple Silicon. It empowers developers and researchers to perform LLM fine-tuning, inference, and model operations entirely within the Apple ecosystem, offering significant performance gains through native Metal GPU kernel and Apple Neural Engine (ANE) integration.

How It Works

Written in Rust, PMetal leverages low-level Metal GPU kernels and direct ANE integration to achieve high performance and efficiency on Apple Silicon. It provides a unified platform from custom kernel development to high-level training APIs, accessible via Rust and Python SDKs, a terminal TUI, and a full desktop GUI. This approach allows for optimized ML workflows without requiring external cloud infrastructure or non-Apple hardware.

Quick Start & Requirements

Prebuilt signed binaries are available on the project's Releases page. Alternatively, build from source using cargo build --release. The optional desktop GUI requires bun install and bun tauri build within crates/pmetal-gui. A primary requirement is Apple Silicon hardware (M1-M5 chip families).

Highlighted Details

  • Metal GPU Optimizations: Features custom Metal shaders for significant speedups in operations like FlashAttention, RoPE, SwiGLU, and LoRA, with tier-based kernel tuning.
  • ANE Integration: Offers native ANE pipeline support for power-efficient training and inference, including dynamic weight pipelines and hybrid CPU-ANE execution.
  • Comprehensive Model Operations: Supports extensive model merging (16 strategies), GGUF quantization (13 formats), and LoRA adapter fusing.
  • Advanced Training Infrastructure: Includes sequence packing for throughput gains, gradient checkpointing, adaptive learning rates, a robust callback system, and native tool/function calling support.
  • Broad Model & Data Support: Handles numerous LLM architectures for inference and LoRA/QLoRA training, with auto-detection for various dataset formats (ShareGPT, Alpaca, JSONL, Parquet, etc.).
  • User Interfaces: Provides both a feature-rich terminal TUI and a Tauri-based desktop GUI for visual model management and training.

Maintenance & Community

No specific details regarding maintainers, sponsorships, or community channels (like Discord/Slack) are present in the provided README.

Licensing & Compatibility

Licensed under either MIT or Apache-2.0, PMetal's permissive licenses allow for commercial use and integration into closed-source projects.

Limitations & Caveats

LoRA training integration is not yet available for all listed architectures, including Llama 4, Qwen 3 MoE, DeepSeek, Cohere, Granite, NemotronH, Phi 4, StarCoder2, RecurrentGemma, and Jamba. Additionally, several architecture implementations reside in pmetal-models but are not yet integrated into the main DynamicModel dispatcher for CLI or SDK use. Features like distributed training and the vocoder require explicit feature flag enablement.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
58 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 3 years ago
Updated 1 month ago
Feedback? Help us improve.