headroom by headroomlabs-ai

LLM context optimization layer

Created 6 months ago

58,363 stars

Top 0.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Wei-Lin Chiang

Cofounder of LMArena

Jeff Hammerbacher

Cofounder of Cloudera

Amanpreet Singh

Cofounder of Contextual AI

Project Summary

This project addresses the significant token redundancy in LLM application outputs, particularly from tools and agent intermediate steps. It offers substantial cost and efficiency benefits for developers building LLM-powered applications by compressing context before it reaches LLM providers.

How It Works

Headroom acts as a transparent proxy, intercepting and optimizing LLM context without altering application logic. It employs a pipeline featuring a "Cache Aligner" to stabilize dynamic tokens, a "Smart Crusher" to remove redundant data, and a "Context Manager" to fit token budgets. The "Compress-Cache-Retrieve" (CCR) mechanism preserves original data separately, retrieving it only when the LLM explicitly requests it, thereby enabling effective provider caching.

Quick Start & Requirements

Primary Install: pip install headroom-ai (SDK), pip install "headroom-ai[proxy]" (Proxy), pip install "headroom-ai[langchain]" (LangChain), pip install "headroom-ai[agno]" (Agno).
Prerequisites: Python 3.10+.
Documentation: Links to Architecture Documentation, LangChain Integration Guide, Agno Integration Guide, Proxy Guide, Memory Guide, Compression Guide, CCR Guide, Metrics, and Troubleshooting are available.

Highlighted Details

Demonstrates 47-92% token savings across various scenarios like code search, SRE debugging, and GitHub issue triage.
Offers zero code changes when deployed as a proxy.
Features reversible compression via CCR, ensuring original data is retrievable.
Provides framework-native integrations for LangChain, Agno, and MCP, alongside proxy support for any OpenAI-compatible client.
Includes content-aware compression for code, logs, and JSON, and optimizes for provider caching.

Maintenance & Community

Community links such as Discord or Slack are not explicitly provided. A CONTRIBUTING.md file is referenced for those interested in contributing. The project invites users to add their projects to a "Who's Using Headroom?" list.

Licensing & Compatibility

The project is licensed under the Apache License 2.0. No specific restrictions for commercial use or closed-source linking are mentioned.

Limitations & Caveats

The system introduces a minor overhead of approximately 1-5ms for compression latency. Savings are most pronounced in tool-heavy workloads and less significant in conversation-heavy applications with minimal tool interaction. Automatic model support relies on naming pattern detection, which may not cover all future or non-standard models.

Health Check

Last Commit

15 hours ago

Responsiveness

Inactive

Pull Requests (30d)

749

Issues (30d)

352

Star History

35,777 stars in the last 30 days