code-graph-rag  by vitali87

Codebase RAG system for multi-language analysis

Created 5 months ago
1,464 stars

Top 27.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a multi-language AI assistant for analyzing and interacting with codebases, targeting developers and researchers working with monorepos or complex code structures. It leverages a knowledge graph approach to enable natural language querying, code editing, and AI-driven optimization, aiming to improve developer productivity and code quality.

How It Works

The system utilizes Tree-sitter for robust, language-agnostic parsing of codebases, building a comprehensive knowledge graph stored in Memgraph. This graph captures code structure, relationships, and dependencies. A Retrieval-Augmented Generation (RAG) system, powered by various LLMs (Gemini, OpenAI, Ollama), allows users to query this graph using natural language. The system can translate queries into Cypher for graph traversal, retrieve code snippets, and even perform surgical code modifications based on AST analysis.

Quick Start & Requirements

  • Install: git clone https://github.com/vitali87/code-graph-rag.git && cd code-graph-rag followed by uv sync (for Python), uv sync --extra treesitter-full (for full multi-language support), or make dev (for development setup).
  • Prerequisites: Python 3.12+, Docker & Docker Compose (for Memgraph), cmake. API keys for cloud models (Gemini/OpenAI) or Ollama for local models.
  • Setup: Requires configuring .env file with API keys or local model endpoints. Memgraph is run via docker-compose up -d.
  • Docs: README

Highlighted Details

  • Multi-Language Support: Fully supports Python, JavaScript, TypeScript, with active development for C++, Rust, Go, Scala, and Java.
  • AI-Powered Editing: Features surgical code replacement with AST-based targeting, visual diff previews, and interactive approval workflows.
  • Reference-Guided Optimization: Can use custom documentation (coding standards, architectural guidelines) to guide AI optimization suggestions.
  • Extensible: Easy to add support for new languages via Tree-sitter grammars.

Maintenance & Community

The project appears to be actively developed by a single primary author (vitali87). There are no explicit links to community channels like Discord or Slack in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This is a critical omission for evaluating commercial use or closed-source integration.

Limitations & Caveats

The project's licensing is not specified, which is a significant blocker for many use cases. While C++, Rust, Go, Scala, and Java parsing are in development, they are not yet fully supported, potentially leading to incomplete graph representations for these languages.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
9
Issues (30d)
18
Star History
149 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.