zipnn by zipnn

Lossless compression library for AI pipelines

Created 1 year ago

290 stars

Top 91.0% on SourcePulse

1 Expert Loves This Project

xiaofan-luan

VP Engineering at Zilliz

Project Summary

ZipNN is a lossless compression library designed to reduce the storage footprint and improve the loading speed of AI models, particularly large language models. It targets AI researchers, developers, and users who work with large model files and need efficient storage and faster deployment. The library offers significant compression ratios and high-speed decompression, especially for BF16 models.

How It Works

ZipNN employs a data-aware compression strategy, automatically analyzing tensor data types (e.g., FP32, BF16, FP8) and applying optimized compression techniques. It leverages C implementations for core operations and supports various compression algorithms like ZSTD, LZ4, and Huffman, with an 'auto' mode selecting the best method. The library also includes specialized plugins for seamless integration with Hugging Face Transformers and vLLM, enabling compressed models to be loaded directly from the filesystem with on-the-fly CPU decompression.

Quick Start & Requirements

Install via pip: pip install zipnn
Requires: numpy, zstandard, torch
Official Docs: https://github.com/zipnn/zipnn
Examples: https://github.com/zipnn/zipnn/tree/main/examples

Highlighted Details

Achieves up to 80GB/s decompression and 13GB/s compression on multi-NUMA CPUs.
Supports FP8 (e4m3fn, e5m2) models.
Integrates with vLLM and Hugging Face via safetensors and HF transformers plugins.
Offers command-line scripts for batch compression/decompression.
BF16 models typically see a 33% size reduction.

Maintenance & Community

Latest release: v0.5.3 (adds FP8 support).
Active development with regular updates noted in the changelog.
Contact: zipnn_compression@gmail.com

Licensing & Compatibility

License: Not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

GPU implementations are noted as "on the way."
The license is not specified, which may impact commercial adoption.
Some integrations (like vLLM in containers) require building custom images or using pre-built ones.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

1

Issues (30d)

1

Star History

2 stars in the last 30 days

Explore Similar Projects

C3-Context-Cascade-Compression by liufanfanlff

Advanced text compression model

Created 1 month ago

Updated 1 month ago

Toolkit-for-Prompt-Compression by 3DAgentWorld

Prompt compression toolkit for LLM inference efficiency

Created 1 year ago

Updated 11 months ago

roadroller by lifthrasiir

JS packer for large demos, targeting js13kGames

Created 4 years ago

Updated 3 years ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

llama-zip by AlexBuz

LLM-powered lossless compression tool

Created 1 year ago

Updated 1 week ago

Dataset_Quantization by magic-research

Research paper for dataset quantization, targeting lossless model training with compressed datasets

Created 2 years ago

Updated 2 years ago

SVD-LLM by AIoT-MLSys-Lab

Compressing LLMs with Singular Value Decomposition

Created 1 year ago

Updated 4 months ago

Starred by

Artidoro Pagnoni

Artidoro Pagnoni(Coauthor of QLoRA; Research Scientist at Meta),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

2 more.

SpQR by Vahe1994

Weight compression research paper for near-lossless LLM quantization

Created 2 years ago

Updated 1 year ago

TransformerCompression by microsoft

Transformer compression via SliceGPT (ICLR'24)

Created 2 years ago

Updated 1 year ago

Starred by

Michael Han

Michael Han(Cofounder of Unsloth) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

pruna by PrunaAI

Model optimization framework for faster, smaller, cheaper, greener AI

Created 10 months ago

Updated 2 days ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Cody Yu

Cody Yu(Coauthor of vLLM; MTS at OpenAI).

kvpress by NVIDIA

LLM KV cache compression made easy

Created 1 year ago

Updated 3 weeks ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

LLM-Pruner by horseee

LLM structural pruner for model compression

Created 2 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

7 more.

LLMLingua by microsoft

Prompt compression for accelerated LLM inference

Created 2 years ago

Updated 2 months ago

Feedback? Help us improve.