Discover and explore top open-source AI tools and projects—updated daily.
QuantaAlphaCode agent benchmark for real-world repository tasks
Top 99.9% on SourcePulse
A benchmark and tooling suite for evaluating code agents on real-world, repository-level tasks. GitTaskBench addresses the gap in existing benchmarks by focusing on tasks requiring comprehensive understanding and utilization of full-scale GitHub repositories, offering a more authentic assessment of agent capabilities for developers and researchers.
How It Works
GitTaskBench evaluates LLM agents on 54 representative tasks with real-world economic value, each mapped to a fixed GitHub repository. This approach mirrors how developers solve complex problems using existing open-source projects. The benchmark systematically assesses an agent's ability to leverage repository code, focusing on "Execution Completion Rate" and "Task Pass Rate" with task-specific, predefined metrics.
Quick Start & Requirements
conda create -n gittaskbench python=3.10 -y), activate it (conda activate gittaskbench), install specific PyTorch versions (pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113), then install GitTaskBench (cd GitTaskBench && pip install -e . or pip install -r requirements.txt).torchvision, torchaudio.gittaskbench grade --taskid <taskid>. All tasks evaluation: gittaskbench grade --all. Results analysis: gittaskbench eval.Highlighted Details
Maintenance & Community
Founded by academics from Tsinghua University, Peking University, CAS, CMU, and HKUST, the project welcomes community contributions for bug fixes, new features, documentation, and test cases. No explicit community channels (e.g., Discord, Slack) are listed.
Licensing & Compatibility
The repository's README does not specify a license. This omission requires clarification for commercial use or closed-source integration.
Limitations & Caveats
The README does not detail specific limitations, known bugs, or unsupported platforms. The installation instructions use a placeholder URL (your-org/GitTaskBench.git) for cloning, which may require adjustment. The specific PyTorch version requirement suggests a potential need for older CUDA toolkits.
5 months ago
Inactive
TheAgentCompany
xlang-ai