apps  by hendrycks

Dataset for measuring coding challenge competence (NeurIPS 2021)

created 4 years ago
478 stars

Top 64.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the APPS dataset and code for measuring coding challenge competence, targeting researchers and developers in AI and natural language processing. It enables the evaluation of large language models on their ability to solve programming problems, offering a standardized benchmark for progress in automated programming.

How It Works

The APPS dataset consists of programming problems sourced from competitive programming platforms. The associated code allows for fine-tuning and evaluating transformer-based language models on their ability to generate correct code solutions. This approach standardizes the assessment of coding competence, moving beyond simple code completion tasks.

Quick Start & Requirements

  • Dataset download: ~1.3GB.
  • Fine-tuned weights for GPT-2 1.5B and GPT-Neo 2.7B are available.
  • Training and evaluation instructions are detailed in train/README and eval/README respectively.
  • The dataset is also available via Hugging Face datasets library.

Highlighted Details

  • Benchmark for measuring coding challenge competence.
  • Includes training and evaluation code.
  • Offers fine-tuned weights for large transformer models.
  • Links to related datasets for math, ethics, and academic subjects.

Maintenance & Community

The project is associated with authors from prominent institutions and presented at NeurIPS 2021. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. Detailed setup and execution instructions are referenced in separate README files within subdirectories.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.