CodeXGLUE  by microsoft

Benchmark for code intelligence tasks

created 4 years ago
1,718 stars

Top 25.4% on sourcepulse

GitHubView on GitHub
Project Summary

CodeXGLUE is a comprehensive benchmark dataset and open challenge designed to advance AI for code intelligence, targeting researchers and practitioners in software engineering and artificial intelligence. It provides a standardized platform for evaluating and comparing models across a wide array of code-related tasks, aiming to boost developer productivity.

How It Works

CodeXGLUE addresses the lack of standardized evaluation for code intelligence by curating 14 datasets across 10 diverse tasks, including code-code translation, defect detection, code completion, code search, and code summarization. It supports models inspired by NLP advancements, offering baseline implementations like CodeBERT (BERT-style) for understanding and CodeGPT (GPT-style) for generation, along with an Encoder-Decoder framework for sequence-to-sequence tasks.

Quick Start & Requirements

  • Access to datasets and baseline models is available via HuggingFace datasets.
  • Specific task repositories contain evaluation methodologies.
  • Training and inference time costs are provided for 2 P100 GPUs.

Highlighted Details

  • Covers 10 diversified code intelligence tasks: code-code, text-code, code-text, and text-text.
  • Includes 14 datasets, with several newly introduced for broader evaluation.
  • Provides three baseline pipelines: CodeBERT, CodeGPT, and Encoder-Decoder.
  • Facilitates model evaluation and comparison through an open challenge submission process.

Maintenance & Community

This project is a research initiative from Microsoft Research Asia, Developer Division, and Bing. Further details on participation and submission are available via email to codexglue@microsoft.com.

Licensing & Compatibility

The code is released under the MIT License, while the datasets are governed by the Computational Use of Data Agreement (C-UDA) License.

Limitations & Caveats

The README does not specify any explicit limitations or caveats regarding model performance, dataset biases, or ongoing development status.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
64 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Travis Fischer Travis Fischer(Founder of Agentic).

LiveCodeBench by LiveCodeBench

0.8%
606
Benchmark for holistic LLM code evaluation
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.