LLM-powered lossless compression tool
Top 92.8% on sourcepulse
llama-zip is a lossless compression utility that leverages a user-provided Large Language Model (LLM) as the probabilistic model for an arithmetic coder. It aims to achieve higher compression ratios on text data than traditional methods by utilizing the predictive capabilities of LLMs. The tool is designed for users who need advanced compression for text-heavy datasets and are willing to trade speed for compression efficiency.
How It Works
llama-zip uses an LLM to predict the next token in a sequence. These predictions, represented as probabilities, are fed into an arithmetic coder to generate the compressed output. By encoding tokens the LLM predicts with high confidence using fewer bits, it achieves better compression. A sliding context window mechanism allows it to process inputs longer than the LLM's maximum context length, with configurable overlap to maintain context. It handles arbitrary binary data by mapping invalid UTF-8 bytes to Unicode private use area code points, though compression ratios may be lower for non-textual data.
Quick Start & Requirements
git clone https://github.com/alexbuz/llama-zip.git && cd llama-zip && pip3 install .
llama-zip <llm_path> [options] <mode> [input]
Highlighted Details
--n-gpu-layers
) for performance tuning.Maintenance & Community
The project is maintained by AlexBuz. Further community or roadmap information is not detailed in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Compression and decompression speeds are significantly slower than traditional utilities due to LLM inference. The backend llama.cpp
does not guarantee deterministic behavior, potentially limiting the portability of compressed files across different systems or configurations. Identical LLM parameters (context length, GPU layers, window overlap) must be used for compression and decompression.
11 months ago
1 day