CodeLLMPaper  by PurCL

CodeLLM paper collection for researchers

created 1 year ago
502 stars

Top 62.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a curated and continuously updated collection of research papers on Large Language Models (LLMs) for code, targeting researchers and practitioners in Software Engineering, Programming Languages, Security, and Natural Language Processing. It aims to streamline literature discovery by systematically collecting, categorizing, and labeling papers from top-tier venues, offering a structured overview of the rapidly evolving field.

How It Works

The project employs a multi-stage selection strategy: abstracts are extracted from bib/HTML files of selected top-tier venues, then filtered using keywords related to LLMs and code. A crucial step involves using LLMs to verify relevance, followed by manual labeling based on a defined taxonomy (Application, Principle, Research Paradigm). This hybrid approach ensures both broad coverage and high relevance, with the process automated by src/process.py.

Quick Start & Requirements

  • To contribute or update, clone the repository and follow contribution guidelines.
  • Requires Python for processing scripts.
  • Links: Contribution Guidelines

Highlighted Details

  • Comprehensive taxonomy covering LLM applications in code generation, repair, testing, analysis, and more.
  • Focus on LLM principles like code model training, security, robustness, and agent design.
  • Includes papers on research paradigms such as benchmarks, empirical studies, and surveys.
  • Detailed venue list from SE, PL, Security, and NLP communities, with plans to incorporate ML conferences.

Maintenance & Community

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but the underlying papers are subject to their original copyrights.
  • Intended solely for research purposes. No full PDF versions are included.

Limitations & Caveats

The collection does not systematically include papers from top ML conferences (ICML, NeurIPS, ICLR) or arXiv, though manual additions occur. The taxonomy is subject to change based on community input.

Health Check
Last commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
90 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.