SWE-Gym  by SWE-Gym

Environment for training software engineering agents

created 9 months ago
513 stars

Top 61.8% on sourcepulse

GitHubView on GitHub
Project Summary

SWE-Gym provides an open environment for training and evaluating software engineering agents and verifiers, addressing the limitations of existing benchmarks by incorporating rigorous verification and real-world repository tasks. It is designed for researchers and developers working on AI for software development, enabling scalable improvements in agent performance.

How It Works

SWE-Gym integrates real-world Python tasks sourced from 11 repositories, providing executable environments and test verification. This approach allows for the training of Large Language Models (LLMs) as agents, capable of interacting with the environment, generating code, and receiving feedback through test results. The environment supports self-improvement through rejection sampling fine-tuning and enables inference-time scaling via learned verifiers that select the most promising solutions.

Quick Start & Requirements

  • Install: Docker images are available under the xingyaoww/sweb.eval.x86_64 prefix on Docker Hub.
  • Prerequisites: Docker is required.
  • Resources: Pre-built Docker images are provided for each task instance.
  • Links: Paper, Data & Models, OpenHands Docs, MoatlessTools Docs.

Highlighted Details

  • Achieves new open state-of-the-art results: 32% on SWE-Bench Verified and 26% on SWE-Bench Lite.
  • Demonstrates promising scaling trends with increased compute, indicating performance is compute-bottlenecked.
  • Enables self-improvement for agents, with a 32B model achieving 20% on SWE-Bench Lite using rejection sampling.
  • Fine-tuning on fewer than 500 agent-environment trajectories yields significant gains (+14% on SWE-Bench Verified).

Maintenance & Community

The project is associated with researchers from UC Berkeley, UIUC, CMU, and Apple. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. The project is presented as open-source, but specific licensing terms for commercial use or closed-source linking are not detailed.

Limitations & Caveats

The current results are primarily bottlenecked by training and inference compute. While promising scaling trends are observed, further improvements are dependent on increased computational resources. The project is presented in the context of an ICML 2025 paper, suggesting it is a recent development.

Health Check
Last commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
67 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.