crabml by crabml

Llama.cpp compatible inference engine in Rust

Created 2 years ago

464 stars

Top 65.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Meng Zhang

Cofounder of TabbyML

Project Summary

This project provides a fast, cross-platform AI inference engine written in Rust, leveraging WebGPU for broad compatibility across browsers, desktops, and servers. It targets developers and researchers needing efficient, low-memory inference for various LLM models, offering SIMD acceleration and multiple quantization methods.

How It Works

CrabML is built in Rust, emphasizing performance and embeddability. It utilizes WebGPU for hardware-accelerated inference, aiming for broad compatibility without requiring native GPU drivers. The engine supports GGUF model formats and employs techniques like memory mapping (mmap) and various quantization levels (e.g., Q8_0, Q4_0) to minimize memory footprint and maximize speed on diverse hardware, including Apple Silicon and x86 CPUs with SIMD extensions.

Quick Start & Requirements

Install/Run: Build with cargo build --release. Run examples using ./target/release/crabml-cli.
Prerequisites: Rust toolchain. Optional: NEON/AVX2 features enabled via RUSTFLAGS.
Resources: Requires GGUF model files.
Links: How to Get GGUF Models

Highlighted Details

llama.cpp compatible API and performance claims.
Cross-platform inference via WebGPU.
SIMD acceleration for ARM (NEON) and x86 (AVX2).
Supports GGUF format for Llama, CodeLlama, Gemma, Mistral, and more.
Multiple quantization methods (Q8_0, Q4_0, Q4_1 recommended on CPU).

Maintenance & Community

The project is actively developed by crabml. Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under Apache License, Version 2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

WebGPU acceleration for many quantization methods is still under development ("WIP"). CPU-based inference with specific SIMD features (NEON, AVX2) is more mature.

crabml by crabml

Explore Similar Projects

fp6_llm by usyd-fsalab

local-gemma by huggingface

calm by zeux

rllama by Noeda

llama2.rs by srush

prima.cpp by Lizonghang

KuiperLLama by zjhellofss

GPTQModel by ModelCloud

ik_llama.cpp by ikawrakow

exllama by turboderp

fastllm by ztxz16

gemma.cpp by google