SWE-smith by SWE-bench

Toolkit for training software engineering agents

Created 10 months ago

592 stars

Top 55.1% on SourcePulse

6 Experts Love This Project

hammer

Jeff Hammerbacher

Cofounder of Cloudera

vincentweisser

Vincent Weisser

Cofounder of Prime Intellect

yiranwu0

Coauthor of AutoGen

huybery

Research Scientist at Alibaba Qwen

and 2 more!

Project Summary

SWE-smith is a toolkit for generating synthetic data and environments to train Software Engineering (SWE) agents. It enables users to transform GitHub repositories into "SWE-gyms," create diverse tasks like program repair, and fine-tune large language models for improved software development performance. The primary benefit is the ability to scale data generation for robust SWE agent training.

How It Works

SWE-smith leverages Docker to create isolated execution environments for each GitHub repository. It synthesizes task instances by identifying code changes that break unit tests, generating corresponding issue text. This process allows for the creation of large, diverse datasets and environments specifically designed for training and evaluating SWE agents.

Quick Start & Requirements

Install from source.
Requires Docker.
Tested on Ubuntu 22.04.4 LTS; Windows and macOS are not supported.

Highlighted Details

Provides 52k task instances and 250+ Docker environments.
Used to fine-tune Qwen 2.5 Coder into SWE-agent-LM-32B, achieving a +32% jump on SWE-bench Verified.
Supports GRPO-style reinforcement learning via SkyRL.

Maintenance & Community

Active development with follow-up projects.
Contact: John Yang, Kilian Lieret (johnby@stanford.edu).

Licensing & Compatibility

License: CC-BY-4.0. This license allows for commercial use and linking, provided attribution is given.

Limitations & Caveats

Strictly limited to Linux (Ubuntu 22.04.4 LTS) environments due to Docker dependency and lack of Windows/macOS support.

Health Check

Last Commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)

17

Issues (30d)

5

Star History

46 stars in the last 30 days

Explore Similar Projects

Starred by

Michele Castata

Michele Castata(President of Replit) and

Harrison Chase

Harrison Chase(Founder of LangChain).

gitwit-agent by jamesmurdza

Containerized agent for automated Git commits via AI

Created 3 years ago

Updated 2 years ago

GitTaskBench by QuantaAlpha

Code agent benchmark for real-world repository tasks

Created 6 months ago

Updated 5 months ago

Starred by

Yiran Wu

Yiran Wu(Coauthor of AutoGen).

live-swe-agent by OpenAutoCoder

A live, runtime self-evolving software engineering agent

Created 4 months ago

Updated 1 month ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

5 more.

SWE-Gym by SWE-Gym

Environment for training software engineering agents

Created 1 year ago

Updated 7 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Simon Willison

Simon Willison(Coauthor of Django).

SWE-bench_Pro-os by scaleapi

AI agents for long-horizon software engineering tasks

Created 6 months ago

Updated 2 days ago

self_improving_coding_agent by MaximeRobeyns

Coding agent framework for autonomous self-improvement

Created 11 months ago

Updated 10 months ago

SWE-AF by Agent-Field

Autonomous software engineering fleet for production-grade code

Created 1 month ago

Updated 20 hours ago

squad by bradygaster

AI agent teams for collaborative code development

Created 1 month ago

Updated 1 day ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI).

refact by smallcloudai

Open-source AI agent for end-to-end software engineering tasks

Created 2 years ago

Updated 1 week ago

Starred by

Chris Van Pelt

Chris Van Pelt(Cofounder of Weights & Biases),

Theo Browne

Theo Browne(Founder of Ping.gg), and

3 more.

cmux by manaflow-ai

Parallel coding agent CLI manager

Created 1 month ago

Updated 17 hours ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium),

Philipp Moritz

Philipp Moritz(Cofounder of Anyscale), and

6 more.

mini-swe-agent by SWE-agent

AI agent for solving GitHub issues and command-line tasks

Created 8 months ago

Updated 1 day ago

thepopebot by stephengpope

Autonomous AI agent for continuous, auditable task execution

Created 1 month ago

Updated 20 hours ago

Feedback? Help us improve.