PRarena  by aavetis

Monitoring AI coding agent pull request performance

Created 4 months ago
284 stars

Top 92.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository tracks and analyzes the pull request (PR) activity of prominent AI software engineering agents, such as Copilot, Codex, Devin, and Codegen. It aims to provide objective metrics on their performance, focusing on PR volume and success rates, enabling users to compare their effectiveness and understand their development workflows.

How It Works

The project monitors GitHub pull requests created by specified AI agents, categorizing them into "All PRs," "Ready PRs" (non-draft, ready for review), and "Merged PRs." By focusing on "Ready PRs," it offers a standardized comparison of agents' ability to produce mergeable code, irrespective of their iteration strategies (e.g., private iteration vs. public drafts). The data is updated every three hours.

Quick Start & Requirements

This section is not applicable as the provided README does not contain installation or setup instructions. It focuses on data analysis and statistics.

Highlighted Details

  • Performance Benchmarks: As of the last update, Copilot shows a 92.71% success rate (125,018 merged out of 134,850 ready PRs), Codex 87.88% (1,356,748 merged out of 1,543,792 ready PRs), and Cursor 91.66% (84,736 merged out of 92,448 ready PRs).
  • Agent Workflow Insights: Devin exhibits a lower success rate (64.2%) with 25,775 merged PRs from 40,145 ready PRs, suggesting a different iteration or quality control process compared to others. Codegen also shows a lower success rate at 61.2% (2,921 merged out of 4,773 ready PRs).
  • Data Transparency: Direct GitHub search query links are provided for each agent's PR data, allowing for verification and deeper exploration.

Maintenance & Community

No information regarding maintainers, community channels (like Discord/Slack), or project roadmap is available in the provided README content.

Licensing & Compatibility

The provided README content does not specify the repository's license or any compatibility notes for commercial use.

Limitations & Caveats

The analysis focuses solely on PR metrics and does not delve into the quality or impact of the merged code. The success rate is calculated based on "Ready PRs," which might not capture all nuances of an agent's contribution or development cycle. The data is limited to publicly available GitHub PR information for the tracked agents.

Health Check
Last Commit

11 hours ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
3
Star History
40 stars in the last 30 days

Explore Similar Projects

Starred by Bryan Helmig Bryan Helmig(Cofounder of Zapier) and Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX).

dspyground by Scale3-Labs

31.8%
259
Optimize AI agent prompts with DSPy GEPA
Created 4 weeks ago
Updated 15 hours ago
Feedback? Help us improve.