aphrodite-engine  by aphrodite-engine

LLM inference engine for serving HuggingFace models at scale

Created 2 years ago
1,615 stars

Top 25.8% on SourcePulse

GitHubView on GitHub
Project Summary

Aphrodite Engine is a high-performance inference engine designed for large-scale deployment of HuggingFace-compatible Large Language Models. It targets developers and platforms requiring efficient, concurrent LLM serving, offering significant throughput improvements and broad quantization support.

How It Works

Aphrodite Engine leverages vLLM's Paged Attention for efficient KV cache management and continuous batching, enabling high throughput for multiple concurrent users. It incorporates optimized CUDA kernels and supports a wide array of quantization formats (AQLM, AWQ, Bitsandbytes, GGUF, GPTQ, etc.) and distributed inference techniques like tensor and pipeline parallelism. This combination allows for substantial memory savings and increased inference speed.

Quick Start & Requirements

  • Install: pip install -U aphrodite-engine --extra-index-url https://downloads.pygmalion.chat/whl
  • Run: aphrodite run <model_name> (e.g., aphrodite run meta-llama/Meta-Llama-3.1-8B-Instruct)
  • Prerequisites: Python 3.9-3.12, CUDA >= 12. Supports Linux and Windows (requires building from source). Broad hardware support including NVIDIA, AMD, Intel GPUs, TPUs, and Inferentia.
  • Resources: By default, uses ~90% of GPU VRAM. Can be limited with --gpu-memory-utilization or --single-user-mode.
  • Docs: Exhaustive documentation available.
  • Demo: Interactive demo available.

Highlighted Details

  • Supports over a dozen quantization formats for memory and speed optimization.
  • Features advanced sampling methods like DRY and XTC.
  • Offers OpenAI-compatible API server for easy integration with existing UIs.
  • Achieves "extremely high throughput" with FP16 models in FP2-FP7 quant formats (v0.6.1).

Maintenance & Community

Developed collaboratively by PygmalionAI and Ruliad AI. Sponsors include Arc Compute, Prime Intellect, PygmalionAI, and Ruliad AI. Contributions are welcome.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Windows installation requires building from source. The project is not associated with any cryptocurrencies, as noted by the developers.

Health Check
Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
13 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
12 more.

mistral.rs by EricLBuehler

0.4%
6k
LLM inference engine for blazing fast performance
Created 1 year ago
Updated 2 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Clement Delangue Clement Delangue(Cofounder of Hugging Face), and
60 more.

vllm by vllm-project

0.7%
67k
LLM serving engine for high-throughput, memory-efficient inference
Created 2 years ago
Updated 9 hours ago
Feedback? Help us improve.