R2E-Gym by R2E-Gym

Scaling open-weight SWE agents with procedural environments and hybrid verifiers

Created 1 year ago

290 stars

Top 90.8% on SourcePulse

View on GitHub

5 Experts Love This Project

Yiran Wu

Coauthor of AutoGen

Vincent Weisser

Cofounder of Prime Intellect

Edward Z. Yang

Research Engineer at Meta; Maintainer of PyTorch

Wing Lian

Founder of Axolotl AI

and 1 more!

Project Summary

R2E-Gym addresses the challenge of scaling open-weight Software Engineering (SWE) agents by providing a large-scale, procedurally generated environment and novel verification strategies. It targets researchers and developers aiming to enhance open-weight SWE agent performance, enabling state-of-the-art results competitive with proprietary models.

How It Works

The framework introduces SWE-GEN, a synthetic data curation recipe generating executable training environments from commits, enhancing scalability. Hybrid Test-time Scaling combines execution-based and execution-free verifiers to optimize inference-time compute for superior performance.

Quick Start & Requirements

Installation: Requires uv (install via curl -LsSf https://astral.sh/uv/install.sh | sh), Python virtual environment setup (uv venv, source .venv/bin/activate), and dependency installation (uv sync && uv pip install -e .).
Prerequisites: Python, Docker (for gym instances), and access to LLMs (e.g., claude-3-5-sonnet-20241022, gpt-4o).
Links: Paper, Data & Models, Project Page, and reproduction guides are available.

Highlighted Details

Achieves 51% pass@1 on SWE-Bench Verified, a new state-of-the-art for open-weight SWE agents, competitive with proprietary models.
Features over 8.1K procedurally curated problems across 13 repositories.
SWE-GEN alone yields 34.4% pass@1 on SWE-Bench Verified by curating data from commits.
DeepSWE models (agentica-org/DeepSWE-Preview) have been released.

Maintenance & Community

Associated with UC Berkeley and ANU researchers (Naman Jain, Jaskirat Singh, Manish Shetty, Liang Zheng, Koushik Sen, Ion Stoica). No specific community channels (Discord, Slack), roadmap links, or active maintenance signals beyond the research publication are detailed.

Licensing & Compatibility

The provided README content does not specify a software license. Clarification is needed regarding terms of use, distribution, and compatibility for commercial or closed-source applications.

Limitations & Caveats

Executable gym instances require substantial disk space (300MB-500MB each). Setup involves multiple command-line steps. Explicit details on alpha/beta status, unsupported platforms, or known bugs are absent. Access to specific LLMs or API keys is implicitly required for agent functionality.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days