candle by huggingface

Minimalist ML framework for Rust, emphasizing performance and ease of use

Created 2 years ago

19,035 stars

Top 2.4% on SourcePulse

View on GitHub

28 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Jason Knight

Director AI Compilers at NVIDIA; Cofounder of OctoML

and 24 more!

Project Summary

Candle is a minimalist machine learning framework for Rust, designed for high performance and ease of use, particularly for serverless inference and production workloads. It targets Rust developers seeking to deploy ML models without Python's overhead, offering GPU acceleration and broad model support.

How It Works

Candle provides a PyTorch-like API in Rust, enabling developers to define, train, and run ML models. It leverages Rust's performance and memory safety, with optional CUDA and cuDNN backends for GPU acceleration. The framework supports custom kernel integration, such as FlashAttention v2, and offers a range of pre-implemented models and utilities.

Quick Start & Requirements

Install via cargo install candle-cli or add candle-core to Cargo.toml.
GPU support requires CUDA Toolkit and cuDNN.
Examples can be run with cargo run --example <example_name> --release.
For CUDA support, compile with --features cuda.
See Installation and Examples.

Highlighted Details

Minimalist design with PyTorch-like syntax.
Supports CPU (with MKL/Accelerate) and CUDA (with NCCL) backends.
Includes implementations for numerous LLMs (LLaMA, Mistral, Phi, etc.), vision models (Stable Diffusion, YOLO), and more.
Supports loading models from various formats: safetensors, npz, ggml, PyTorch.
Offers WASM support for browser-based inference.

Maintenance & Community

Developed by Hugging Face.
Active development with contributions from the Rust ML community.
Links to demos, tutorials, and related crates are provided.

Licensing & Compatibility

Apache 2.0 License.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

Building with CUDA features may require specific CUDA versions and compatible GCC versions.
Accessing certain models (e.g., LLaMA-2) requires Hugging Face authentication and model acceptance.
Some advanced features like FlashAttention v2 might require manual setup of dependencies (e.g., CUTLASS).

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

275 stars in the last 30 days