SWE-Gym by SWE-Gym

Environment for training software engineering agents

Created 1 year ago

609 stars

Top 53.9% on SourcePulse

View on GitHub

7 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Jeff Hammerbacher

Cofounder of Cloudera

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Wing Lian

Founder of Axolotl AI

and 3 more!

Project Summary

SWE-Gym provides an open environment for training and evaluating software engineering agents and verifiers, addressing the limitations of existing benchmarks by incorporating rigorous verification and real-world repository tasks. It is designed for researchers and developers working on AI for software development, enabling scalable improvements in agent performance.

How It Works

SWE-Gym integrates real-world Python tasks sourced from 11 repositories, providing executable environments and test verification. This approach allows for the training of Large Language Models (LLMs) as agents, capable of interacting with the environment, generating code, and receiving feedback through test results. The environment supports self-improvement through rejection sampling fine-tuning and enables inference-time scaling via learned verifiers that select the most promising solutions.

Quick Start & Requirements

Install: Docker images are available under the xingyaoww/sweb.eval.x86_64 prefix on Docker Hub.
Prerequisites: Docker is required.
Resources: Pre-built Docker images are provided for each task instance.
Links: Paper, Data & Models, OpenHands Docs, MoatlessTools Docs.

Highlighted Details

Achieves new open state-of-the-art results: 32% on SWE-Bench Verified and 26% on SWE-Bench Lite.
Demonstrates promising scaling trends with increased compute, indicating performance is compute-bottlenecked.
Enables self-improvement for agents, with a 32B model achieving 20% on SWE-Bench Lite using rejection sampling.
Fine-tuning on fewer than 500 agent-environment trajectories yields significant gains (+14% on SWE-Bench Verified).

Maintenance & Community

The project is associated with researchers from UC Berkeley, UIUC, CMU, and Apple. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. The project is presented as open-source, but specific licensing terms for commercial use or closed-source linking are not detailed.

Limitations & Caveats

The current results are primarily bottlenecked by training and inference compute. While promising scaling trends are observed, further improvements are dependent on increased computational resources. The project is presented in the context of an ICML 2025 paper, suggesting it is a recent development.

Health Check

Last Commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days