Awesome-Code-LLM  by codefuse-ai

Curated list of code LLM research, plus datasets

created 1 year ago
2,767 stars

Top 17.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated list of research papers, models, and datasets related to Large Language Models (LLMs) for code and software engineering activities. It serves as a comprehensive resource for researchers and practitioners interested in the intersection of Natural Language Processing (NLP) and Software Engineering (SE), providing an organized overview of the rapidly evolving field.

How It Works

The repository categorizes research into broad areas such as LLM architectures (base models, code-adapted LLMs, encoder-decoder models), fine-tuning strategies (instruction tuning, reinforcement learning), and reasoning capabilities (code agents, interactive coding). It also details downstream tasks like code generation, translation, repair, and analysis, alongside relevant datasets and evaluation metrics. The organization aims to provide a structured understanding of the landscape, from foundational models to specific applications.

Quick Start & Requirements

This repository is a collection of links and information, not a runnable software project. No installation or specific requirements are needed to browse its contents.

Highlighted Details

  • Features recent papers and models, including contributions from Codefuse AI (GALLa, CodeFuse-CGM, EasyDeploy, Rodimus, CodeFuse-CGE).
  • Includes a comprehensive list of surveys on LLMs for code, covering both NLP and SE perspectives.
  • Provides extensive lists of LLMs, datasets, and benchmarks relevant to code intelligence.
  • Offers recommended readings for those new to NLP or LLMs.

Maintenance & Community

The repository is actively maintained, with recent updates noted for April 2025. Contributions are welcomed via GitHub issues. The primary contributors are associated with the AI Native team at Ant Group, who also maintain the open-source project CodeFuse.

Licensing & Compatibility

The repository itself is a collection of links and does not have a specific license. The linked papers and datasets will have their own respective licenses.

Limitations & Caveats

As a curated list, the repository's value is dependent on the completeness and accuracy of its entries. While extensive, it may not capture every single relevant publication in this fast-moving field.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
2
Star History
343 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.