Crane  by lucasjinreal

High-performance Rust inference engine

Created 11 months ago
280 stars

Top 93.1% on SourcePulse

GitHubView on GitHub
Project Summary

Crane is a high-performance, pure Rust inference engine for LLMs, VLMs, TTS, and OCR, built on the Candle framework. It targets developers seeking a simpler, faster alternative to C++-based solutions like llama.cpp, offering significant speedups and ease of deployment, especially on Apple Silicon.

How It Works

Crane leverages Rust's Candle framework to achieve blazing-fast inference speeds on both CPUs and GPUs. Its core design emphasizes eliminating C++ complexity while maintaining native performance, enabling hardware-agnostic execution across CPU, CUDA, and Metal (for Apple Silicon). This approach simplifies model integration, allowing new models to be added with minimal code.

Quick Start & Requirements

  • Primary run command: cargo run --bin qwenchat --release (after downloading models).
  • Prerequisites: Latest Rust toolchain.
  • Dependencies: Candle framework (implicit). Optimized for Apple Silicon (Metal). No GGUF conversion required for macOS.
  • Links: GitHub repository (implied), model download examples provided.

Highlighted Details

  • Performance: Achieves up to 6x speedup over vanilla transformers on M1 Macs without quantization. Reports 17.5 t/s for Qwen2.5-500M on M1 Metal (f32) and 35 t/s (f16).
  • Supported Models: Includes Qwen3/Qwen2.5 (LLM/VLM), Moonshine ASR, Silero VAD, PaddleOCR-VL. TTS support
Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
0
Star History
46 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
8 more.

TransformerEngine by NVIDIA

0.3%
3k
Library for Transformer model acceleration on NVIDIA GPUs
Created 3 years ago
Updated 21 hours ago
Feedback? Help us improve.