headroom  by chopratejas

LLM context optimization layer

Created 1 month ago
606 stars

Top 54.0% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the significant token redundancy in LLM application outputs, particularly from tools and agent intermediate steps. It offers substantial cost and efficiency benefits for developers building LLM-powered applications by compressing context before it reaches LLM providers.

How It Works

Headroom acts as a transparent proxy, intercepting and optimizing LLM context without altering application logic. It employs a pipeline featuring a "Cache Aligner" to stabilize dynamic tokens, a "Smart Crusher" to remove redundant data, and a "Context Manager" to fit token budgets. The "Compress-Cache-Retrieve" (CCR) mechanism preserves original data separately, retrieving it only when the LLM explicitly requests it, thereby enabling effective provider caching.

Quick Start & Requirements

  • Primary Install: pip install headroom-ai (SDK), pip install "headroom-ai[proxy]" (Proxy), pip install "headroom-ai[langchain]" (LangChain), pip install "headroom-ai[agno]" (Agno).
  • Prerequisites: Python 3.10+.
  • Documentation: Links to Architecture Documentation, LangChain Integration Guide, Agno Integration Guide, Proxy Guide, Memory Guide, Compression Guide, CCR Guide, Metrics, and Troubleshooting are available.

Highlighted Details

  • Demonstrates 47-92% token savings across various scenarios like code search, SRE debugging, and GitHub issue triage.
  • Offers zero code changes when deployed as a proxy.
  • Features reversible compression via CCR, ensuring original data is retrievable.
  • Provides framework-native integrations for LangChain, Agno, and MCP, alongside proxy support for any OpenAI-compatible client.
  • Includes content-aware compression for code, logs, and JSON, and optimizes for provider caching.

Maintenance & Community

Community links such as Discord or Slack are not explicitly provided. A CONTRIBUTING.md file is referenced for those interested in contributing. The project invites users to add their projects to a "Who's Using Headroom?" list.

Licensing & Compatibility

The project is licensed under the Apache License 2.0. No specific restrictions for commercial use or closed-source linking are mentioned.

Limitations & Caveats

The system introduces a minor overhead of approximately 1-5ms for compression latency. Savings are most pronounced in tool-heavy workloads and less significant in conversation-heavy applications with minimal tool interaction. Automatic model support relies on naming pattern detection, which may not cover all future or non-standard models.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
12
Issues (30d)
11
Star History
264 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
11 more.

GPTCache by zilliztech

0.1%
8k
Semantic cache for LLM queries, integrated with LangChain and LlamaIndex
Created 2 years ago
Updated 7 months ago
Feedback? Help us improve.