CodeLLMPaper by PurCL

CodeLLM paper collection for researchers

Created 1 year ago

603 stars

Top 54.2% on SourcePulse

Project Summary

This repository provides a curated and continuously updated collection of research papers on Large Language Models (LLMs) for code, targeting researchers and practitioners in Software Engineering, Programming Languages, Security, and Natural Language Processing. It aims to streamline literature discovery by systematically collecting, categorizing, and labeling papers from top-tier venues, offering a structured overview of the rapidly evolving field.

How It Works

The project employs a multi-stage selection strategy: abstracts are extracted from bib/HTML files of selected top-tier venues, then filtered using keywords related to LLMs and code. A crucial step involves using LLMs to verify relevance, followed by manual labeling based on a defined taxonomy (Application, Principle, Research Paradigm). This hybrid approach ensures both broad coverage and high relevance, with the process automated by src/process.py.

Quick Start & Requirements

To contribute or update, clone the repository and follow contribution guidelines.
Requires Python for processing scripts.
Links: Contribution Guidelines

Highlighted Details

Comprehensive taxonomy covering LLM applications in code generation, repair, testing, analysis, and more.
Focus on LLM principles like code model training, security, robustness, and agent design.
Includes papers on research paradigms such as benchmarks, empirical studies, and surveys.
Detailed venue list from SE, PL, Security, and NLP communities, with plans to incorporate ML conferences.

Maintenance & Community

Maintained by the PurCL group at Purdue.
Contributions are welcomed via Pull Requests or Issue submissions for new papers or taxonomy changes.
Contact: stephenw.wangcp@gmail.com or wang6590@purdue.edu

Licensing & Compatibility

The repository itself appears to be under a permissive license, but the underlying papers are subject to their original copyrights.
Intended solely for research purposes. No full PDF versions are included.

Limitations & Caveats

The collection does not systematically include papers from top ML conferences (ICML, NeurIPS, ICLR) or arXiv, though manual additions occur. The taxonomy is subject to change based on community input.

CodeLLMPaper by PurCL

Explore Similar Projects

LLM4SE by gai4se

LLMPapers by SEU-COIN

ICLR2025-Papers-with-Code by yinizhilian

codebase-digest by kamilstanuch

LLM4AlgorithmDesign by FeiLiu36

LLMsStudy by XingYu-Zhong

LLM-Travel by Glanvery

llm-paper-daily by xianshang33

Awesome-Code-LLM by huybery

awesome-ai-coding by wsxiaoys

Awesome-LLM-for-RecSys by CHIANGEL

Autonomous-Agents by tmgthb