crabml  by crabml

Llama.cpp compatible inference engine in Rust

Created 2 years ago
462 stars

Top 65.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a fast, cross-platform AI inference engine written in Rust, leveraging WebGPU for broad compatibility across browsers, desktops, and servers. It targets developers and researchers needing efficient, low-memory inference for various LLM models, offering SIMD acceleration and multiple quantization methods.

How It Works

CrabML is built in Rust, emphasizing performance and embeddability. It utilizes WebGPU for hardware-accelerated inference, aiming for broad compatibility without requiring native GPU drivers. The engine supports GGUF model formats and employs techniques like memory mapping (mmap) and various quantization levels (e.g., Q8_0, Q4_0) to minimize memory footprint and maximize speed on diverse hardware, including Apple Silicon and x86 CPUs with SIMD extensions.

Quick Start & Requirements

  • Install/Run: Build with cargo build --release. Run examples using ./target/release/crabml-cli.
  • Prerequisites: Rust toolchain. Optional: NEON/AVX2 features enabled via RUSTFLAGS.
  • Resources: Requires GGUF model files.
  • Links: How to Get GGUF Models

Highlighted Details

  • llama.cpp compatible API and performance claims.
  • Cross-platform inference via WebGPU.
  • SIMD acceleration for ARM (NEON) and x86 (AVX2).
  • Supports GGUF format for Llama, CodeLlama, Gemma, Mistral, and more.
  • Multiple quantization methods (Q8_0, Q4_0, Q4_1 recommended on CPU).

Maintenance & Community

The project is actively developed by crabml. Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under Apache License, Version 2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

WebGPU acceleration for many quantization methods is still under development ("WIP"). CPU-based inference with specific SIMD features (NEON, AVX2) is more mature.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
2 more.

local-gemma by huggingface

0.3%
376
CLI tool for local Gemma-2 inference
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
7 more.

gemma.cpp by google

0.1%
7k
C++ inference engine for Google's Gemma models
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.