candle  by huggingface

Minimalist ML framework for Rust, emphasizing performance and ease of use

created 2 years ago
17,746 stars

Top 2.5% on sourcepulse

GitHubView on GitHub
Project Summary

Candle is a minimalist machine learning framework for Rust, designed for high performance and ease of use, particularly for serverless inference and production workloads. It targets Rust developers seeking to deploy ML models without Python's overhead, offering GPU acceleration and broad model support.

How It Works

Candle provides a PyTorch-like API in Rust, enabling developers to define, train, and run ML models. It leverages Rust's performance and memory safety, with optional CUDA and cuDNN backends for GPU acceleration. The framework supports custom kernel integration, such as FlashAttention v2, and offers a range of pre-implemented models and utilities.

Quick Start & Requirements

  • Install via cargo install candle-cli or add candle-core to Cargo.toml.
  • GPU support requires CUDA Toolkit and cuDNN.
  • Examples can be run with cargo run --example <example_name> --release.
  • For CUDA support, compile with --features cuda.
  • See Installation and Examples.

Highlighted Details

  • Minimalist design with PyTorch-like syntax.
  • Supports CPU (with MKL/Accelerate) and CUDA (with NCCL) backends.
  • Includes implementations for numerous LLMs (LLaMA, Mistral, Phi, etc.), vision models (Stable Diffusion, YOLO), and more.
  • Supports loading models from various formats: safetensors, npz, ggml, PyTorch.
  • Offers WASM support for browser-based inference.

Maintenance & Community

  • Developed by Hugging Face.
  • Active development with contributions from the Rust ML community.
  • Links to demos, tutorials, and related crates are provided.

Licensing & Compatibility

  • Apache 2.0 License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Building with CUDA features may require specific CUDA versions and compatible GCC versions.
  • Accessing certain models (e.g., LLaMA-2) requires Hugging Face authentication and model acceptance.
  • Some advanced features like FlashAttention v2 might require manual setup of dependencies (e.g., CUTLASS).
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
17
Issues (30d)
16
Star History
723 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Shawn Wang Shawn Wang(Editor of Latent Space), and
8 more.

llm by rustformers

0%
6k
Rust ecosystem for LLM Rust inference (unmaintained)
created 2 years ago
updated 1 year ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
created 5 years ago
updated 3 weeks ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.