similarity  by mizchi

Code similarity detection across multiple languages

created 1 month ago
575 stars

Top 56.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides high-performance, AST-based code similarity detection tools for multiple programming languages, primarily targeting developers and teams looking to identify and refactor duplicate code. Its key benefit is enabling efficient code deduplication and improving code quality through AI-assisted refactoring workflows.

How It Works

The tools leverage Abstract Syntax Tree (AST) parsing, using oxc-parser for TypeScript/JavaScript and tree-sitter for Python, Rust, and other languages. It extracts function/method nodes and applies a Tree Structure Edit Distance (TSED) algorithm, incorporating size penalties to calculate similarity scores. This AST-based approach ensures semantic comparison rather than simple text matching, leading to more accurate detection of code patterns.

Quick Start & Requirements

  • Installation: Primarily via cargo install <tool-name> (e.g., cargo install similarity-ts).
  • Prerequisites: Rust toolchain (for building from source or installing via cargo).
  • Usage: Run the respective tool (e.g., similarity-ts .) in your codebase directory. Detailed options are available via -h.
  • Documentation: AI Assistant Guide

Highlighted Details

  • similarity-ts is production-ready, while Python and Rust versions are in Beta.
  • Supports AI integration for refactoring suggestions by providing structured output.
  • Offers experimental features like type similarity detection and partial code overlap analysis.
  • Includes experimental support for Go, Java, C/C++, C#, and Ruby via similarity-generic.

Maintenance & Community

The project is maintained by mizchi. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The project is released under the MIT license, which permits commercial use and integration with closed-source projects.

Limitations & Caveats

The similarity-generic tool for Go, Java, C/C++, C#, and Ruby is experimental, with potential limitations in performance and accuracy compared to specialized tools. Some language-specific features may not be fully supported in the generic version.

Health Check
Last commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
580 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ives van Hoorne Ives van Hoorne(Cofounder of CodeSandbox), and
4 more.

bloop by BloopAI

0.0%
9k
Code search engine with natural language interface
created 2 years ago
updated 8 months ago
Feedback? Help us improve.