caveman-compression  by wilpel

Semantic compression for LLM contexts

Created 4 months ago
345 stars

Top 80.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Caveman Compression offers lossless semantic compression for Large Language Model (LLM) contexts, targeting engineers and researchers. It significantly reduces token counts by stripping predictable grammar while preserving factual content, enabling more information to fit within LLM context windows.

How It Works

The core approach leverages LLMs' ability to reconstruct predictable linguistic elements. By removing grammar ("a", "the", "is"), connectives ("therefore", "however"), and filler words, the method retains unpredictable facts, numbers, names, and technical terms. This strategy achieves substantial token reduction (up to 58%) by eliminating only what LLMs can reliably infer, thus preserving meaning and enabling denser context. Three methods are offered: LLM-based for maximum savings, MLM-based for offline predictability-aware compression, and NLP-based for free, offline, multilingual rule-based compression.

Quick Start & Requirements

Installation involves creating a Python virtual environment and installing dependencies via pip install -r requirements.txt (LLM), requirements-nlp.txt (NLP), or requirements-mlm.txt (MLM). The LLM-based method requires an OpenAI API key configured in a .env file. NLP and MLM methods require spaCy language models (e.g., en_core_web_sm). Python 3.8+ is a prerequisite. Links to Quick Start, Examples, Benchmarks, and Spec are provided.

Highlighted Details

  • Achieves 40-58% reduction with LLM-based compression, 20-30% with MLM-based, and 15-30% with NLP-based methods.
  • Automated benchmarks demonstrate 100% factual preservation with 12-25% compression ratios.
  • Effective for RAG knowledge bases (fitting 2-3x more context) and agent internal reasoning (reducing chain-of-thought tokens by 50%).
  • Core principles include stripping connectives, using concise sentences, action verbs, concrete language, and active voice.

Maintenance & Community

The project is authored by William Peltomäki. No specific details regarding active maintenance, community channels (like Discord/Slack), or notable contributors are present in the README.

Licensing & Compatibility

The project is released under the MIT license. This permissive license allows for broad compatibility, including commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

Caveman Compression is not suitable for user-facing content, marketing copy, legal documents, or emotional communication. The LLM-based method incurs API costs, while the MLM-based method requires downloading a ~500MB model. The NLP-based method offers lower compression rates compared to the other two.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
134 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

prompt-lookup-decoding by apoorvumang

0%
602
Decoding method for faster LLM generation
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
1 more.

text-splitter by benbrandt

0.3%
584
Rust crate for splitting text into semantic chunks
Created 3 years ago
Updated 2 days ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
2 more.

chonkie by chonkie-inc

0.3%
4k
Chunking library for RAG applications
Created 1 year ago
Updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
7 more.

LLMLingua by microsoft

0.4%
6k
Prompt compression for accelerated LLM inference
Created 2 years ago
Updated 3 days ago
Feedback? Help us improve.