Discover and explore top open-source AI tools and projects—updated daily.
HazyResearchLightweight long context representation for LLMs
Top 98.0% on SourcePulse
Summary
Cartridges addresses the high cost of processing long contexts in Large Language Models (LLMs) by introducing a novel method for creating compact Key-Value (KV) caches. Targeting researchers and engineers working with LLMs, it enables significant throughput gains (up to 26x) while preserving generation quality, making long-context applications more efficient and cost-effective.
How It Works
The core innovation is "self-study," a test-time training recipe that distills a large corpus into a small, efficient KV cache, termed a "cartridge." This process involves generating synthetic conversational data about the corpus using AI agents (one asking questions, another answering) and then training the cartridge via context distillation. This approach drastically reduces KV cache size, directly translating to higher throughput during inference.
Quick Start & Requirements
uv, then run uv pip install -e ..CARTRIDGES_DIR, CARTRIDGES_OUTPUT_DIR, CARTRIDGES_WANDB_PROJECT, and CARTRIDGES_WANDB_ENTITY.uv, wandb, an inference server (Tokasaurus or SGLang), and GPU access are necessary. Modal is recommended for scalable inference workloads.arXiv:2506.06266), Synthesis example (examples/arxiv/arxiv_synthesize.py), Training example (examples/arxiv/arxiv_train.py), Tokasaurus (https://github.com/ScalingIntelligence/tokasaurus), SGLang (https://docs.sglang.ai/start/install.html).Highlighted Details
Maintenance & Community
Compute resources for this project were provided by Modal, Together, Prime Intellect, Voltage Park, and Azure. No explicit community channels (e.g., Discord, Slack) are listed in the README. The roadmap and known issues are detailed in the "TODOs" section.
Licensing & Compatibility
The license type is indicated by a GitHub badge but not explicitly stated in the README text. No specific compatibility notes for commercial use or closed-source linking are provided.
Limitations & Caveats
Occasional NCCL collective operation timeouts during data parallel training may require setting distributed_backend="gloo". Trained cartridges are not yet uploadable to HuggingFace. Local chat functionality currently requires downloading cartridges from WandB, not directly from local files.
2 weeks ago
Inactive
JIA-Lab-research