Discover and explore top open-source AI tools and projects—updated daily.
LLM inference benchmark for comparing frameworks
Top 69.3% on SourcePulse
This repository provides a benchmark for Large Language Model (LLM) inference frameworks, targeting developers and researchers evaluating performance and features. It offers a comparative analysis of various serving backends and their capabilities, aiding in the selection of optimal inference solutions.
How It Works
The benchmark evaluates LLM inference frameworks based on their ability to serve models, support for different backends, quantization methods, batching, and distributed inference. It presents detailed performance metrics like Tokens Per Second (TPS), Queries Per Second (QPS), and First Token Latency (FTL) under various configurations, including different batch sizes and quantization levels (e.g., 8-bit, 4-bit AWQ, GPTQ, GGUF).
Quick Start & Requirements
01-ai/Yi-6B-Chat
in BFloat16, 8-bit, 4-bit AWQ, and GGUF formats.Highlighted Details
Maintenance & Community
No specific information on maintainers, community channels, or roadmap is present in the provided README.
Licensing & Compatibility
The repository's license is not specified in the provided README.
Limitations & Caveats
1 year ago
Inactive