Code Graph LLM for repository-level software engineering tasks
Top 72.0% on SourcePulse
CodeFuse-CGM is a framework designed for repository-level software engineering tasks, utilizing a graph-based approach augmented by Large Language Models (LLMs). It targets developers and researchers aiming to automate issue resolution by understanding code context and structure, offering a significant improvement in automated code repair and analysis.
How It Works
CGM constructs a repository-level code graph to represent project context. It then employs a Retrieval-Augmented Generation (RAG) pipeline consisting of four stages: Rewriter, Retriever, Reranker, and Reader. The Rewriter analyzes issues and generates queries, the Retriever finds relevant code subgraphs, the Reranker prioritizes files within these subgraphs, and the Reader generates code patches based on the refined context. This graph-integrated approach allows models to generalize across various SE tasks.
Quick Start & Requirements
transformers==4.46.1
, tokenizers==0.20.0
, accelerate==1.0.1
, peft==0.13.2
, jinja2==2.11.3
, fuzzywuzzy==0.18.0
, python-Levenshtein==0.25.1
, networkx==3.0
.torch==2.1.0
, transformers==4.39.2
, tokenizers==0.15.2
, accelerate==0.28.0
for CGE-large, and RapidFuzz==1.5.0
, faiss-cpu
for Retriever. vllm>=0.8.5
is needed for the Reranker.Highlighted Details
Maintenance & Community
The project is actively developed by the AI Native team at Ant Group, with recent updates in January 2025. They have a strong track record of publications and open-source contributions (CodeFuse project). Community contributions are welcomed via pull requests and issues.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.
Limitations & Caveats
The setup involves multiple complex steps for generating embeddings and running inference for each module, requiring significant computational resources and technical expertise. Specific model weights (e.g., CGE-large, Qwen Model) are referenced but not directly linked for download in the README.
1 month ago
1 week