swe-pruner by Ayanami1314

Context pruning for coding agents

Created 6 months ago

296 stars

Top 89.4% on SourcePulse

Project Summary

SWE-Pruner addresses the challenge of high token costs and latency in Large Language Model (LLM) agents for software development by introducing a novel, task-aware context pruning mechanism. It targets engineers and researchers building or utilizing coding agents, offering significant token savings and reduced latency without compromising the preservation of critical code details. The primary benefit is enabling more efficient and cost-effective LLM-driven software engineering workflows.

How It Works

SWE-Pruner employs a two-step, task-aware approach to context compression. First, it formulates explicit, task-specific goals to guide the pruning process, moving beyond generic metrics like perplexity. Second, it utilizes a lightweight, 0.6B neural skimmer model to dynamically select and preserve semantically critical lines of code. This method ensures that essential implementation details and logical structures are retained, mimicking how human programmers selectively focus on relevant code sections.

Quick Start & Requirements

The project utilizes uv for dependency management. Detailed installation instructions are available within subfolder READMEs. Models are tracked via git lfs and can be downloaded directly from Hugging Face (https://huggingface.co/ayanami-kitasan/code-pruner) or via the provided tutorial. Training scripts recommend Slurm with at least 4 GPUs. An inference tutorial is available for local setup.

Links:
- Paper: https://arxiv.org/abs/2601.16746
- GitHub Repo: https://github.com/Ayanami1314/swe-pruner
- PyPI: https://pypi.org/project/swe-pruner/

Highlighted Details

Task-Aware Pruning: Intelligently guides context reduction based on specific intents (e.g., "focus on error handling").
Coding Agent Native: Designed for seamless integration into multi-turn agent workflows and decision loops.
Semantic Highlight: Employs a compact 0.6B model to identify and retain semantically vital code lines.
Extreme Compression: Achieves 23-54% token reduction on SWE-Bench and up to 14.84x compression on LongCodeQA, significantly cutting costs and latency.
Flexibly Use: Adaptable framework supporting various LLMs and software engineering tasks like debugging and feature development.

Maintenance & Community

The project shows recent activity with updates in March 2026. Acknowledgements include the Bytedance Douyin Team and Alibaba Qwen Team. A Notion blog detailing technical approaches is available. No direct community channels like Discord or Slack were specified in the README.

Licensing & Compatibility

The project is licensed under the MIT license. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

Training code and certain evaluation benchmarks (SWEQA) are marked as "coming soon," indicating ongoing development. Direct cloning via git may be cumbersome due to large model files managed by git lfs; downloading from Hugging Face is recommended. The project appears to be based on recent research (2026 paper).

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days