R2E-Gym  by R2E-Gym

Scaling open-weight SWE agents with procedural environments and hybrid verifiers

Created 11 months ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

R2E-Gym addresses the challenge of scaling open-weight Software Engineering (SWE) agents by providing a large-scale, procedurally generated environment and novel verification strategies. It targets researchers and developers aiming to enhance open-weight SWE agent performance, enabling state-of-the-art results competitive with proprietary models.

How It Works

The framework introduces SWE-GEN, a synthetic data curation recipe generating executable training environments from commits, enhancing scalability. Hybrid Test-time Scaling combines execution-based and execution-free verifiers to optimize inference-time compute for superior performance.

Quick Start & Requirements

  • Installation: Requires uv (install via curl -LsSf https://astral.sh/uv/install.sh | sh), Python virtual environment setup (uv venv, source .venv/bin/activate), and dependency installation (uv sync && uv pip install -e .).
  • Prerequisites: Python, Docker (for gym instances), and access to LLMs (e.g., claude-3-5-sonnet-20241022, gpt-4o).
  • Links: Paper, Data & Models, Project Page, and reproduction guides are available.

Highlighted Details

  • Achieves 51% pass@1 on SWE-Bench Verified, a new state-of-the-art for open-weight SWE agents, competitive with proprietary models.
  • Features over 8.1K procedurally curated problems across 13 repositories.
  • SWE-GEN alone yields 34.4% pass@1 on SWE-Bench Verified by curating data from commits.
  • DeepSWE models (agentica-org/DeepSWE-Preview) have been released.

Maintenance & Community

Associated with UC Berkeley and ANU researchers (Naman Jain, Jaskirat Singh, Manish Shetty, Liang Zheng, Koushik Sen, Ion Stoica). No specific community channels (Discord, Slack), roadmap links, or active maintenance signals beyond the research publication are detailed.

Licensing & Compatibility

The provided README content does not specify a software license. Clarification is needed regarding terms of use, distribution, and compatibility for commercial or closed-source applications.

Limitations & Caveats

Executable gym instances require substantial disk space (300MB-500MB each). Setup involves multiple command-line steps. Explicit details on alpha/beta status, unsupported platforms, or known bugs are absent. Access to specific LLMs or API keys is implicitly required for agent functionality.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
17 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.