SWE-Gym  by SWE-Gym

Environment for training software engineering agents

Created 10 months ago
542 stars

Top 58.8% on SourcePulse

GitHubView on GitHub
Project Summary

SWE-Gym provides an open environment for training and evaluating software engineering agents and verifiers, addressing the limitations of existing benchmarks by incorporating rigorous verification and real-world repository tasks. It is designed for researchers and developers working on AI for software development, enabling scalable improvements in agent performance.

How It Works

SWE-Gym integrates real-world Python tasks sourced from 11 repositories, providing executable environments and test verification. This approach allows for the training of Large Language Models (LLMs) as agents, capable of interacting with the environment, generating code, and receiving feedback through test results. The environment supports self-improvement through rejection sampling fine-tuning and enables inference-time scaling via learned verifiers that select the most promising solutions.

Quick Start & Requirements

  • Install: Docker images are available under the xingyaoww/sweb.eval.x86_64 prefix on Docker Hub.
  • Prerequisites: Docker is required.
  • Resources: Pre-built Docker images are provided for each task instance.
  • Links: Paper, Data & Models, OpenHands Docs, MoatlessTools Docs.

Highlighted Details

  • Achieves new open state-of-the-art results: 32% on SWE-Bench Verified and 26% on SWE-Bench Lite.
  • Demonstrates promising scaling trends with increased compute, indicating performance is compute-bottlenecked.
  • Enables self-improvement for agents, with a 32B model achieving 20% on SWE-Bench Lite using rejection sampling.
  • Fine-tuning on fewer than 500 agent-environment trajectories yields significant gains (+14% on SWE-Bench Verified).

Maintenance & Community

The project is associated with researchers from UC Berkeley, UIUC, CMU, and Apple. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. The project is presented as open-source, but specific licensing terms for commercial use or closed-source linking are not detailed.

Limitations & Caveats

The current results are primarily bottlenecked by training and inference compute. While promising scaling trends are observed, further improvements are dependent on increased computational resources. The project is presented in the context of an ICML 2025 paper, suggesting it is a recent development.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.