Discover and explore top open-source AI tools and projects—updated daily.
guoqingbaoPure Rust LLM inference engine
Top 97.9% on SourcePulse
Summary
xInfer provides blazing-fast LLM inference entirely in pure Rust, eliminating Python and PyTorch dependencies. It targets engineers and power users seeking efficient, portable, and production-ready LLM solutions, offering accelerated inference with a minimal footprint and broad hardware compatibility.
How It Works
Its core is a pure Rust backend, eschewing Python/PyTorch for maximum performance and reduced complexity. xInfer leverages native optimizations like Flash Attention, FlashInfer, CUDA Graphs, continuous batching, and prefix caching. Aggressive KV compression (TurboQuant 2-4 bit) dramatically extends context length (up to 4.3x) with minimal quality loss, enabling large models on consumer GPUs. This yields a tiny footprint and cross-platform support for CUDA (Linux/Windows) and Metal (macOS) via a single binary and API.
Quick Start & Requirements
Installation is via a shell script (curl -sSL https://guoqingbao.github.io/xinfer/install.sh | bash) or npm (npm install -g xinfer-ai). Run models from HuggingFace IDs or local paths using the xinfer CLI, with an option for a built-in ChatGPT-style Web UI (--ui-server). Python usage is also supported (python3 -m xinfer.server). Key requirements include a compatible GPU (NVIDIA CUDA or Apple Silicon Metal). Building from source needs a Rust compiler and potentially CUDA Toolkit or Xcode command-line tools.
Highlighted Details
1 day ago
Inactive
GeeeekExplorer