Llama.cpp compatible inference engine in Rust
Top 67.4% on sourcepulse
This project provides a fast, cross-platform AI inference engine written in Rust, leveraging WebGPU for broad compatibility across browsers, desktops, and servers. It targets developers and researchers needing efficient, low-memory inference for various LLM models, offering SIMD acceleration and multiple quantization methods.
How It Works
CrabML is built in Rust, emphasizing performance and embeddability. It utilizes WebGPU for hardware-accelerated inference, aiming for broad compatibility without requiring native GPU drivers. The engine supports GGUF model formats and employs techniques like memory mapping (mmap) and various quantization levels (e.g., Q8_0, Q4_0) to minimize memory footprint and maximize speed on diverse hardware, including Apple Silicon and x86 CPUs with SIMD extensions.
Quick Start & Requirements
cargo build --release
. Run examples using ./target/release/crabml-cli
.RUSTFLAGS
.Highlighted Details
Maintenance & Community
The project is actively developed by crabml. Community channels are not explicitly mentioned in the README.
Licensing & Compatibility
Licensed under Apache License, Version 2.0. This license is permissive and generally compatible with commercial and closed-source applications.
Limitations & Caveats
WebGPU acceleration for many quantization methods is still under development ("WIP"). CPU-based inference with specific SIMD features (NEON, AVX2) is more mature.
7 months ago
1 day