llama-zip  by AlexBuz

LLM-powered lossless compression tool

created 1 year ago
285 stars

Top 92.8% on sourcepulse

GitHubView on GitHub
Project Summary

llama-zip is a lossless compression utility that leverages a user-provided Large Language Model (LLM) as the probabilistic model for an arithmetic coder. It aims to achieve higher compression ratios on text data than traditional methods by utilizing the predictive capabilities of LLMs. The tool is designed for users who need advanced compression for text-heavy datasets and are willing to trade speed for compression efficiency.

How It Works

llama-zip uses an LLM to predict the next token in a sequence. These predictions, represented as probabilities, are fed into an arithmetic coder to generate the compressed output. By encoding tokens the LLM predicts with high confidence using fewer bits, it achieves better compression. A sliding context window mechanism allows it to process inputs longer than the LLM's maximum context length, with configurable overlap to maintain context. It handles arbitrary binary data by mapping invalid UTF-8 bytes to Unicode private use area code points, though compression ratios may be lower for non-textual data.

Quick Start & Requirements

  • Installation: git clone https://github.com/alexbuz/llama-zip.git && cd llama-zip && pip3 install .
  • Prerequisites: A compatible LLM in GGUF format (e.g., Llama 3.1 8B quantized version) that fits in system memory.
  • Usage: llama-zip <llm_path> [options] <mode> [input]
  • Documentation: CLI Usage, API Usage

Highlighted Details

  • Achieves significantly higher compression ratios than traditional utilities like zstd, bzip2, and xz on text corpora (e.g., Calgary Corpus, source code).
  • Supports arbitrary input length via a sliding context window with configurable overlap.
  • Can compress binary data, though with potentially reduced efficiency compared to text.
  • Offers GPU offloading (--n-gpu-layers) for performance tuning.

Maintenance & Community

The project is maintained by AlexBuz. Further community or roadmap information is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Compression and decompression speeds are significantly slower than traditional utilities due to LLM inference. The backend llama.cpp does not guarantee deterministic behavior, potentially limiting the portability of compressed files across different systems or configurations. Identical LLM parameters (context length, GPU layers, window overlap) must be used for compression and decompression.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Feedback? Help us improve.