crabml  by crabml

Llama.cpp compatible inference engine in Rust

created 1 year ago
455 stars

Top 67.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a fast, cross-platform AI inference engine written in Rust, leveraging WebGPU for broad compatibility across browsers, desktops, and servers. It targets developers and researchers needing efficient, low-memory inference for various LLM models, offering SIMD acceleration and multiple quantization methods.

How It Works

CrabML is built in Rust, emphasizing performance and embeddability. It utilizes WebGPU for hardware-accelerated inference, aiming for broad compatibility without requiring native GPU drivers. The engine supports GGUF model formats and employs techniques like memory mapping (mmap) and various quantization levels (e.g., Q8_0, Q4_0) to minimize memory footprint and maximize speed on diverse hardware, including Apple Silicon and x86 CPUs with SIMD extensions.

Quick Start & Requirements

  • Install/Run: Build with cargo build --release. Run examples using ./target/release/crabml-cli.
  • Prerequisites: Rust toolchain. Optional: NEON/AVX2 features enabled via RUSTFLAGS.
  • Resources: Requires GGUF model files.
  • Links: How to Get GGUF Models

Highlighted Details

  • llama.cpp compatible API and performance claims.
  • Cross-platform inference via WebGPU.
  • SIMD acceleration for ARM (NEON) and x86 (AVX2).
  • Supports GGUF format for Llama, CodeLlama, Gemma, Mistral, and more.
  • Multiple quantization methods (Q8_0, Q4_0, Q4_1 recommended on CPU).

Maintenance & Community

The project is actively developed by crabml. Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under Apache License, Version 2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

WebGPU acceleration for many quantization methods is still under development ("WIP"). CPU-based inference with specific SIMD features (NEON, AVX2) is more mature.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
5 more.

gemma.cpp by google

0.1%
7k
C++ inference engine for Google's Gemma models
created 1 year ago
updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 16 hours ago
Feedback? Help us improve.