Discover and explore top open-source AI tools and projects—updated daily.
vllm-projectEfficient safetensors extension for compressed LLM tensor storage
Top 98.7% on SourcePulse
compressed-tensors tackles the fragmentation in LLM model compression by extending the safetensors format into a unified, extensible solution. It enables efficient storage and management of diverse quantized and sparse tensor data, supporting popular techniques like GPTQ, AWQ, SmoothQuant, INT8, FP8, and various sparsity patterns. This library benefits developers and researchers by simplifying the integration of multiple compression methods, streamlining deployment pipelines, and reducing the overhead associated with managing disparate storage formats.
How It Works
The core innovation lies in extending safetensors to create a single, consistent format capable of representing a wide array of compression schemes. It supports granular quantization options, including weight-only (e.g., W4A16), activation (e.g., W8A8), KV cache, and non-uniform quantization across different layers. Additionally, it handles both unstructured and semi-structured sparsity patterns. This unified approach simplifies experimentation and deployment by abstracting away the complexities of individual compression techniques.
Quick Start & Requirements
pip install compressed-tensorspip install --pre compressed-tensorsgit clone https://github.com/vllm-project/compressed-tensors && cd compressed-tensors && pip install -e .Highlighted Details
Maintenance & Community
The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or a public roadmap.
Licensing & Compatibility
The README does not explicitly state the project's license type or provide compatibility notes for commercial use or integration with closed-source projects.
Limitations & Caveats
The README focuses on the library's capabilities and does not explicitly detail limitations, alpha status, or known bugs. The advanced quantization examples necessitate a CUDA-enabled environment.
1 day ago
Inactive
Vahe1994
huggingface
alibaba
vllm-project