CodeLLM paper collection for researchers
Top 62.8% on sourcepulse
This repository provides a curated and continuously updated collection of research papers on Large Language Models (LLMs) for code, targeting researchers and practitioners in Software Engineering, Programming Languages, Security, and Natural Language Processing. It aims to streamline literature discovery by systematically collecting, categorizing, and labeling papers from top-tier venues, offering a structured overview of the rapidly evolving field.
How It Works
The project employs a multi-stage selection strategy: abstracts are extracted from bib/HTML files of selected top-tier venues, then filtered using keywords related to LLMs and code. A crucial step involves using LLMs to verify relevance, followed by manual labeling based on a defined taxonomy (Application, Principle, Research Paradigm). This hybrid approach ensures both broad coverage and high relevance, with the process automated by src/process.py
.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The collection does not systematically include papers from top ML conferences (ICML, NeurIPS, ICLR) or arXiv, though manual additions occur. The taxonomy is subject to change based on community input.
4 days ago
Inactive