llama-zip by AlexBuz

LLM-powered lossless compression tool

Created 1 year ago

298 stars

Top 89.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

llama-zip is a lossless compression utility that leverages a user-provided Large Language Model (LLM) as the probabilistic model for an arithmetic coder. It aims to achieve higher compression ratios on text data than traditional methods by utilizing the predictive capabilities of LLMs. The tool is designed for users who need advanced compression for text-heavy datasets and are willing to trade speed for compression efficiency.

How It Works

llama-zip uses an LLM to predict the next token in a sequence. These predictions, represented as probabilities, are fed into an arithmetic coder to generate the compressed output. By encoding tokens the LLM predicts with high confidence using fewer bits, it achieves better compression. A sliding context window mechanism allows it to process inputs longer than the LLM's maximum context length, with configurable overlap to maintain context. It handles arbitrary binary data by mapping invalid UTF-8 bytes to Unicode private use area code points, though compression ratios may be lower for non-textual data.

Quick Start & Requirements

Installation: git clone https://github.com/alexbuz/llama-zip.git && cd llama-zip && pip3 install .
Prerequisites: A compatible LLM in GGUF format (e.g., Llama 3.1 8B quantized version) that fits in system memory.
Usage: llama-zip <llm_path> [options] <mode> [input]
Documentation: CLI Usage, API Usage

Highlighted Details

Achieves significantly higher compression ratios than traditional utilities like zstd, bzip2, and xz on text corpora (e.g., Calgary Corpus, source code).
Supports arbitrary input length via a sliding context window with configurable overlap.
Can compress binary data, though with potentially reduced efficiency compared to text.
Offers GPU offloading (--n-gpu-layers) for performance tuning.

Maintenance & Community

The project is maintained by AlexBuz. Further community or roadmap information is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Compression and decompression speeds are significantly slower than traditional utilities due to LLM inference. The backend llama.cpp does not guarantee deterministic behavior, potentially limiting the portability of compressed files across different systems or configurations. Identical LLM parameters (context length, GPU layers, window overlap) must be used for compression and decompression.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days