cpg  by Fraunhofer-AISEC

Code Property Graphs for comprehensive source code analysis

Created 6 years ago
382 stars

Top 74.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Fraunhofer-AISEC/cpg provides a library for generating Code Property Graphs (CPGs) from source code across numerous programming languages. It targets software engineers, security researchers, and developers needing deep code analysis capabilities. The primary benefit is enabling powerful, graph-based querying and navigation of source code, even for incomplete or syntactically imperfect codebases, facilitating tasks like vulnerability detection and code understanding.

How It Works

The project models source code as a labeled directed multi-graph (CPG), storing nodes and edges with properties. It leverages "forgiving" parsers like Eclipse CDT for C/C++ and JavaParser for Java, allowing analysis of code that may not compile. For broader language support, it integrates with LLVM IR via the javacpp project, theoretically enabling analysis for any language that compiles to LLVM. The CPG can be extended with custom analysis passes after initial graph construction.

Quick Start & Requirements

Installation is primarily as a library dependency via Maven/Gradle, requiring cpg-core and language-specific modules (e.g., cpg-language-go, cpg-language-python). The C++ frontend necessitates adding a custom Ivy repository for Eclipse CDT artifacts. Python support requires jep installation and Python 3.9-3.13. Go support depends on native libgoast and is limited to Linux/macOS. Visualization is available via the cpg-neo4j subproject, which requires Neo4j with the APOC plugin enabled.

Highlighted Details

  • Extensive language support includes C/C++ (C17), Java (Java 13), Python, Go, TypeScript, Ruby, JVM Bytecode, LLVM IR, OpenQASM, and Python-Qiskit.
  • "Forgiving" parsers for C/C++ and Java enable analysis of incomplete or semantically incorrect source code.
  • LLVM IR backend theoretically supports any language compiling to LLVM (e.g., Rust, Swift, Haskell).
  • Features an extensible analysis pipeline with multiple passes for custom graph enrichment.
  • Optional visualization capabilities are provided through Neo4j integration (cpg-neo4j).

Maintenance & Community

The project acknowledges contributions from various authors. Specific community channels (like Discord or Slack), roadmaps, or active maintainer information are not detailed in the provided README. External contributions require signing a Contributor License Agreement (CLA).

Licensing & Compatibility

The specific open-source license for the Fraunhofer-AISEC/cpg project is not explicitly stated in the provided README content. Compatibility for commercial use or linking with closed-source projects is not detailed.

Limitations & Caveats

The LLVM IR parser requires valid LLVM IR and is not forgiving. Python support is restricted to versions 3.9-3.13 and necessitates jep installation. The Go frontend relies on native libgoast and is currently Linux/macOS-specific. Frontends for TypeScript, Ruby, JVM Bytecode, and LLVM are marked as experimental or incubating, suggesting potential instability or incomplete features. The full library, particularly with LLVM support, can be very large.

Health Check
Last Commit

13 hours ago

Responsiveness

Inactive

Pull Requests (30d)
21
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
6 more.

awesome-machine-learning-on-source-code by src-d

0.1%
6k
Curated list of ML applied to source code (MLonCode)
Created 8 years ago
Updated 4 years ago
Feedback? Help us improve.