Codebase RAG system for multi-language analysis
Top 39.6% on SourcePulse
This project provides a multi-language AI assistant for analyzing and interacting with codebases, targeting developers and researchers working with monorepos or complex code structures. It leverages a knowledge graph approach to enable natural language querying, code editing, and AI-driven optimization, aiming to improve developer productivity and code quality.
How It Works
The system utilizes Tree-sitter for robust, language-agnostic parsing of codebases, building a comprehensive knowledge graph stored in Memgraph. This graph captures code structure, relationships, and dependencies. A Retrieval-Augmented Generation (RAG) system, powered by various LLMs (Gemini, OpenAI, Ollama), allows users to query this graph using natural language. The system can translate queries into Cypher for graph traversal, retrieve code snippets, and even perform surgical code modifications based on AST analysis.
Quick Start & Requirements
git clone https://github.com/vitali87/code-graph-rag.git && cd code-graph-rag
followed by uv sync
(for Python), uv sync --extra treesitter-full
(for full multi-language support), or make dev
(for development setup)..env
file with API keys or local model endpoints. Memgraph is run via docker-compose up -d
.Highlighted Details
Maintenance & Community
The project appears to be actively developed by a single primary author (vitali87
). There are no explicit links to community channels like Discord or Slack in the README.
Licensing & Compatibility
The repository does not explicitly state a license in the README. This is a critical omission for evaluating commercial use or closed-source integration.
Limitations & Caveats
The project's licensing is not specified, which is a significant blocker for many use cases. While C++, Rust, Go, Scala, and Java parsing are in development, they are not yet fully supported, potentially leading to incomplete graph representations for these languages.
2 days ago
Inactive