PRarena by aavetis

Monitoring AI coding agent pull request performance

Created 9 months ago

298 stars

Top 89.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Nathan Lambert

Research Scientist at AI2

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Meng Zhang

Cofounder of TabbyML

Project Summary

This repository tracks and analyzes the pull request (PR) activity of prominent AI software engineering agents, such as Copilot, Codex, Devin, and Codegen. It aims to provide objective metrics on their performance, focusing on PR volume and success rates, enabling users to compare their effectiveness and understand their development workflows.

How It Works

The project monitors GitHub pull requests created by specified AI agents, categorizing them into "All PRs," "Ready PRs" (non-draft, ready for review), and "Merged PRs." By focusing on "Ready PRs," it offers a standardized comparison of agents' ability to produce mergeable code, irrespective of their iteration strategies (e.g., private iteration vs. public drafts). The data is updated every three hours.

Quick Start & Requirements

This section is not applicable as the provided README does not contain installation or setup instructions. It focuses on data analysis and statistics.

Highlighted Details

Performance Benchmarks: As of the last update, Copilot shows a 92.71% success rate (125,018 merged out of 134,850 ready PRs), Codex 87.88% (1,356,748 merged out of 1,543,792 ready PRs), and Cursor 91.66% (84,736 merged out of 92,448 ready PRs).
Agent Workflow Insights: Devin exhibits a lower success rate (64.2%) with 25,775 merged PRs from 40,145 ready PRs, suggesting a different iteration or quality control process compared to others. Codegen also shows a lower success rate at 61.2% (2,921 merged out of 4,773 ready PRs).
Data Transparency: Direct GitHub search query links are provided for each agent's PR data, allowing for verification and deeper exploration.

Maintenance & Community

No information regarding maintainers, community channels (like Discord/Slack), or project roadmap is available in the provided README content.

Licensing & Compatibility

The provided README content does not specify the repository's license or any compatibility notes for commercial use.

Limitations & Caveats

The analysis focuses solely on PR metrics and does not delve into the quality or impact of the merged code. The success rate is calculated based on "Ready PRs," which might not capture all nuances of an agent's contribution or development cycle. The data is limited to publicly available GitHub PR information for the tracked agents.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days