Code similarity detection across multiple languages
Top 56.9% on sourcepulse
This project provides high-performance, AST-based code similarity detection tools for multiple programming languages, primarily targeting developers and teams looking to identify and refactor duplicate code. Its key benefit is enabling efficient code deduplication and improving code quality through AI-assisted refactoring workflows.
How It Works
The tools leverage Abstract Syntax Tree (AST) parsing, using oxc-parser
for TypeScript/JavaScript and tree-sitter
for Python, Rust, and other languages. It extracts function/method nodes and applies a Tree Structure Edit Distance (TSED) algorithm, incorporating size penalties to calculate similarity scores. This AST-based approach ensures semantic comparison rather than simple text matching, leading to more accurate detection of code patterns.
Quick Start & Requirements
cargo install <tool-name>
(e.g., cargo install similarity-ts
).cargo
).similarity-ts .
) in your codebase directory. Detailed options are available via -h
.Highlighted Details
similarity-ts
is production-ready, while Python and Rust versions are in Beta.similarity-generic
.Maintenance & Community
The project is maintained by mizchi. Links to community channels or roadmaps are not explicitly provided in the README.
Licensing & Compatibility
The project is released under the MIT license, which permits commercial use and integration with closed-source projects.
Limitations & Caveats
The similarity-generic
tool for Go, Java, C/C++, C#, and Ruby is experimental, with potential limitations in performance and accuracy compared to specialized tools. Some language-specific features may not be fully supported in the generic version.
6 days ago
Inactive