Discover and explore top open-source AI tools and projects—updated daily.
ericcurtinLLM inference server optimized for resource efficiency and broad API compatibility
New!
Top 76.6% on SourcePulse
A high-performance LLM inference server built in Rust, inferrs targets developers and users needing a flexible, lightweight, and fast solution for serving large language models. It aims to provide a rich feature set, including broad API compatibility and efficient resource usage, making it suitable for various deployment scenarios where memory and binary footprint are critical considerations.
How It Works
inferrs is implemented in Rust, resulting in a single, lightweight binary. It employs TurboQuant for KV cache management, alongside per-context allocation, distinguishing itself from solutions that may consume significant GPU memory. The architecture features an axum-based HTTP server handling requests via Server-Sent Events (SSE), which communicates with a backend engine composed of a scheduler, transformer, KV cache, and sampler. This design prioritizes memory efficiency and performance.
Quick Start & Requirements
Installation is available via package managers: brew install inferrs on macOS/Linux, or scoop install inferrs on Windows after adding the ericcurtin/scoop-inferrs bucket. Models can be served using commands like inferrs run <model_path> or inferrs serve <model_path>. The project supports a wide array of hardware backends including CUDA, ROCm, Metal, Hexagon, OpenVino, MUSA, CANN, Vulkan, and CPU.
Highlighted Details
Maintenance & Community
The provided README does not contain information regarding maintainers, community channels (like Discord or Slack), sponsorships, or roadmap details.
Licensing & Compatibility
The README does not explicitly state the project's license. Therefore, compatibility for commercial use or closed-source linking cannot be determined from the provided text.
Limitations & Caveats
The README focuses on the project's strengths and does not detail specific limitations, known bugs, or unsupported platforms. The absence of explicit licensing information presents a significant caveat for potential adopters.
1 day ago
Inactive
ai-dynamo
LMCache