leanctx by jia-gao

Reduce LLM token costs with intelligent prompt compression

Created 2 months ago

312 stars

Top 86.2% on SourcePulse

Project Summary

Summary

Leanctx is a Python SDK designed to drastically reduce LLM input token costs (40-60%) for production applications by compressing prompts without requiring code modifications. It targets developers building RAG systems, conversational agents, and document processing pipelines where large contexts lead to high token bills. By intelligently compressing dynamic content while preserving critical elements like code and tool calls, leanctx offers significant cost savings and improved accuracy on long-context benchmarks, running locally by default for enhanced privacy.

How It Works

Leanctx acts as a drop-in wrapper around existing LLM SDKs (OpenAI, Anthropic, Gemini). It intercepts requests, applies a configurable compression pipeline before sending them to the LLM provider. This pipeline includes middleware for mode (on/off) and triggers (e.g., minimum token threshold). A content classifier routes message parts—code, errors, prose, etc.—to specific compressors: verbatim preservation for critical data, the local LLMLingua-2 model (Lingua) for general prose, or a configured LLM (SelfLLM) for higher-quality summarization. This approach ensures essential information remains intact while reducing redundant tokens, offering a flexible trade-off between compression ratio, cost, and quality.

Quick Start & Requirements

Install core functionality and provider SDKs with: pip install 'leanctx[openai,anthropic,gemini]' To enable local LLMLingua-2 compression, add: pip install 'leanctx[lingua]' (requires ~1.2 GB download for model weights to ~/.cache/huggingface/ on first use). The project includes a CLI for benchmarking (leanctx bench run).

Highlighted Details

Achieves 40% accuracy on the LongBench v2 short subset while removing 57% of tokens, doubling baseline accuracy compared to naive truncation.
Preserves code, tool calls, and error traces verbatim, compressing only less critical prose and log data.
Complements provider-side prompt caching by compressing dynamic, per-query content, allowing savings to stack.
Offers optional OpenTelemetry (OTel) integration for detailed observability of compression performance and cost.

Maintenance & Community

The project is actively maintained, with version 0.3.1 released on April 26, 2026. The roadmap outlines planned features including full LongBench v2 sweep, Docker Hub publishing, multimodal compression, and TypeScript SDK porting. Community links (Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

Leanctx is released under the MIT License, permitting commercial use and integration into closed-source applications.

Limitations & Caveats

As of v0.3.1, Gemini's multimodal requests and function calls automatically fall back to passthrough mode ("opaque-bailout") as compression is not yet supported for these specific types. Compression for these scenarios is targeted for v0.3.x.

leanctx by jia-gao

Explore Similar Projects

codai by meysamhadeli

minion by Sentdex

magenta.nvim by dlants

langchain-php by kambo-1st

codeqai by fynnfluegge

moatless-tools by aorwall

pinescript-ai by arturoabreuhd

ai_code_reader by duma-repo

llm-vscode by huggingface

shotgun_code by glebkudr

open-code-review by alibaba

gpt-engineer by AntonOsika