aphrodite-engine  by aphrodite-engine

LLM inference engine for serving HuggingFace models at scale

created 2 years ago
1,489 stars

Top 28.2% on sourcepulse

GitHubView on GitHub
Project Summary

Aphrodite Engine is a high-performance inference engine designed for large-scale deployment of HuggingFace-compatible Large Language Models. It targets developers and platforms requiring efficient, concurrent LLM serving, offering significant throughput improvements and broad quantization support.

How It Works

Aphrodite Engine leverages vLLM's Paged Attention for efficient KV cache management and continuous batching, enabling high throughput for multiple concurrent users. It incorporates optimized CUDA kernels and supports a wide array of quantization formats (AQLM, AWQ, Bitsandbytes, GGUF, GPTQ, etc.) and distributed inference techniques like tensor and pipeline parallelism. This combination allows for substantial memory savings and increased inference speed.

Quick Start & Requirements

  • Install: pip install -U aphrodite-engine --extra-index-url https://downloads.pygmalion.chat/whl
  • Run: aphrodite run <model_name> (e.g., aphrodite run meta-llama/Meta-Llama-3.1-8B-Instruct)
  • Prerequisites: Python 3.9-3.12, CUDA >= 12. Supports Linux and Windows (requires building from source). Broad hardware support including NVIDIA, AMD, Intel GPUs, TPUs, and Inferentia.
  • Resources: By default, uses ~90% of GPU VRAM. Can be limited with --gpu-memory-utilization or --single-user-mode.
  • Docs: Exhaustive documentation available.
  • Demo: Interactive demo available.

Highlighted Details

  • Supports over a dozen quantization formats for memory and speed optimization.
  • Features advanced sampling methods like DRY and XTC.
  • Offers OpenAI-compatible API server for easy integration with existing UIs.
  • Achieves "extremely high throughput" with FP16 models in FP2-FP7 quant formats (v0.6.1).

Maintenance & Community

Developed collaboratively by PygmalionAI and Ruliad AI. Sponsors include Arc Compute, Prime Intellect, PygmalionAI, and Ruliad AI. Contributions are welcome.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Windows installation requires building from source. The project is not associated with any cryptocurrencies, as noted by the developers.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
10
Issues (30d)
0
Star History
90 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.