similarity  by mizchi

Code similarity detection across multiple languages

Created 3 months ago
642 stars

Top 51.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides high-performance, AST-based code similarity detection tools for multiple programming languages, primarily targeting developers and teams looking to identify and refactor duplicate code. Its key benefit is enabling efficient code deduplication and improving code quality through AI-assisted refactoring workflows.

How It Works

The tools leverage Abstract Syntax Tree (AST) parsing, using oxc-parser for TypeScript/JavaScript and tree-sitter for Python, Rust, and other languages. It extracts function/method nodes and applies a Tree Structure Edit Distance (TSED) algorithm, incorporating size penalties to calculate similarity scores. This AST-based approach ensures semantic comparison rather than simple text matching, leading to more accurate detection of code patterns.

Quick Start & Requirements

  • Installation: Primarily via cargo install <tool-name> (e.g., cargo install similarity-ts).
  • Prerequisites: Rust toolchain (for building from source or installing via cargo).
  • Usage: Run the respective tool (e.g., similarity-ts .) in your codebase directory. Detailed options are available via -h.
  • Documentation: AI Assistant Guide

Highlighted Details

  • similarity-ts is production-ready, while Python and Rust versions are in Beta.
  • Supports AI integration for refactoring suggestions by providing structured output.
  • Offers experimental features like type similarity detection and partial code overlap analysis.
  • Includes experimental support for Go, Java, C/C++, C#, and Ruby via similarity-generic.

Maintenance & Community

The project is maintained by mizchi. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The project is released under the MIT license, which permits commercial use and integration with closed-source projects.

Limitations & Caveats

The similarity-generic tool for Go, Java, C/C++, C#, and Ruby is experimental, with potential limitations in performance and accuracy compared to specialized tools. Some language-specific features may not be fully supported in the generic version.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
46 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.