leanctx  by jia-gao

Reduce LLM token costs with intelligent prompt compression

Created 1 month ago
308 stars

Top 87.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Leanctx is a Python SDK designed to drastically reduce LLM input token costs (40-60%) for production applications by compressing prompts without requiring code modifications. It targets developers building RAG systems, conversational agents, and document processing pipelines where large contexts lead to high token bills. By intelligently compressing dynamic content while preserving critical elements like code and tool calls, leanctx offers significant cost savings and improved accuracy on long-context benchmarks, running locally by default for enhanced privacy.

How It Works

Leanctx acts as a drop-in wrapper around existing LLM SDKs (OpenAI, Anthropic, Gemini). It intercepts requests, applies a configurable compression pipeline before sending them to the LLM provider. This pipeline includes middleware for mode (on/off) and triggers (e.g., minimum token threshold). A content classifier routes message parts—code, errors, prose, etc.—to specific compressors: verbatim preservation for critical data, the local LLMLingua-2 model (Lingua) for general prose, or a configured LLM (SelfLLM) for higher-quality summarization. This approach ensures essential information remains intact while reducing redundant tokens, offering a flexible trade-off between compression ratio, cost, and quality.

Quick Start & Requirements

Install core functionality and provider SDKs with: pip install 'leanctx[openai,anthropic,gemini]' To enable local LLMLingua-2 compression, add: pip install 'leanctx[lingua]' (requires ~1.2 GB download for model weights to ~/.cache/huggingface/ on first use). The project includes a CLI for benchmarking (leanctx bench run).

Highlighted Details

  • Achieves 40% accuracy on the LongBench v2 short subset while removing 57% of tokens, doubling baseline accuracy compared to naive truncation.
  • Preserves code, tool calls, and error traces verbatim, compressing only less critical prose and log data.
  • Complements provider-side prompt caching by compressing dynamic, per-query content, allowing savings to stack.
  • Offers optional OpenTelemetry (OTel) integration for detailed observability of compression performance and cost.

Maintenance & Community

The project is actively maintained, with version 0.3.1 released on April 26, 2026. The roadmap outlines planned features including full LongBench v2 sweep, Docker Hub publishing, multimodal compression, and TypeScript SDK porting. Community links (Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

Leanctx is released under the MIT License, permitting commercial use and integration into closed-source applications.

Limitations & Caveats

As of v0.3.1, Gemini's multimodal requests and function calls automatically fall back to passthrough mode ("opaque-bailout") as compression is not yet supported for these specific types. Compression for these scenarios is targeted for v0.3.x.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
307 stars in the last 30 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
8 more.

llm-vscode by huggingface

0%
1k
VSCode extension for LLM-powered code development
Created 3 years ago
Updated 1 month ago
Feedback? Help us improve.