Crane by lucasjinreal

High-performance Rust inference engine

Created 1 year ago

422 stars

Top 69.2% on SourcePulse

Project Summary

Crane is a high-performance, pure Rust inference engine for LLMs, VLMs, TTS, and OCR, built on the Candle framework. It targets developers seeking a simpler, faster alternative to C++-based solutions like llama.cpp, offering significant speedups and ease of deployment, especially on Apple Silicon.

How It Works

Crane leverages Rust's Candle framework to achieve blazing-fast inference speeds on both CPUs and GPUs. Its core design emphasizes eliminating C++ complexity while maintaining native performance, enabling hardware-agnostic execution across CPU, CUDA, and Metal (for Apple Silicon). This approach simplifies model integration, allowing new models to be added with minimal code.

Quick Start & Requirements

Primary run command: cargo run --bin qwenchat --release (after downloading models).
Prerequisites: Latest Rust toolchain.
Dependencies: Candle framework (implicit). Optimized for Apple Silicon (Metal). No GGUF conversion required for macOS.
Links: GitHub repository (implied), model download examples provided.

Highlighted Details

Performance: Achieves up to 6x speedup over vanilla transformers on M1 Macs without quantization. Reports 17.5 t/s for Qwen2.5-500M on M1 Metal (f32) and 35 t/s (f16).
Supported Models: Includes Qwen3/Qwen2.5 (LLM/VLM), Moonshine ASR, Silero VAD, PaddleOCR-VL. TTS support

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

13

Issues (30d)

7

Star History

20 stars in the last 30 days

Explore Similar Projects

mHC.cu by AndreSlavescu

Accelerating deep learning with CUDA mHC kernels

Created 6 months ago

Updated 4 months ago

AKO4ALL by TongmingLAIC

Agentic kernel optimization for any hardware

Created 3 months ago

Updated 1 month ago

usls by jamjamjon

Efficient Rust inference for vision and vision-language models

Created 2 years ago

Updated 5 days ago

FlashRT by flashrt-project

High-performance realtime inference engine for AI workloads

Created 2 months ago

Updated 1 day ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm) and

Charlie Marsh

Charlie Marsh(Founder of Astral; Author of Ruff, uv).

uzu by trymirai

High-performance inference engine for Apple Silicon

Created 1 year ago

Updated 1 day ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

Tutel by microsoft

Optimized MoE library for modern training and inference

Created 5 years ago

Updated 3 days ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 2 years ago

Updated 5 days ago

bolt by huawei-noah

Deep learning library for high-performance, heterogeneous deployment

Created 6 years ago

Updated 1 year ago

Starred by

Andrew Tulloch

Andrew Tulloch(Cofounder of Thinking Machines Lab) and

Bryan Catanzaro

Bryan Catanzaro(VP Applied Deep Learning Research at NVIDIA).

DeepBench by baidu-research

Deep learning benchmark for hardware performance on core operations

Created 9 years ago

Updated 5 years ago

aiter by ROCm

High-performance AI operator library for ROCm

Created 1 year ago

Updated 17 hours ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

8 more.

TransformerEngine by NVIDIA

Library for Transformer model acceleration on NVIDIA GPUs

Created 3 years ago

Updated 23 hours ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory), and

1 more.

openvino by openvinotoolkit

Open source toolkit for optimizing and deploying AI inference

Created 7 years ago

Updated 1 day ago

Feedback? Help us improve.