code_qa by sankalp1999

Explore codebases with natural language RAG

Created 1 year ago

280 stars

Top 93.1% on SourcePulse

Project Summary

Summary sankalp1999/code_qa is a RAG-powered system for natural language querying of codebases. It targets developers and researchers seeking to understand complex code by providing contextual answers and interactive chat, leveraging Treesitter for AST parsing and LanceDB for efficient vector storage.

How It Works The system parses codebases into abstract syntax trees (ASTs) using Treesitter, then indexes code chunks with OpenAI or Jina embeddings stored in LanceDB. Natural language queries retrieve relevant code snippets via vector search and generate contextual answers using LLMs like GPT-4o, with an optional Colbert-based reranker for improved relevance. This approach enables efficient, semantic code exploration.

Quick Start & Requirements

Install: Clone repo, set up Python 3.6+ venv, pip install -r requirements.txt, run redis-server.
Prerequisites: Python 3.6+, Redis server on localhost:6379.
Configuration: Create .env with OPENAI_API_KEY (required) and optional JINA_API_KEY.
Usage: Index code with ./index_codebase.sh <path>, run server with python app.py <folder_path>, access UI at http://localhost:5001.
Docs/Demo: Blog posts detailing the build process are linked in the README.

Highlighted Details

Optimized branch (feature/optimization) offers 2.5x faster performance (10-20s worst-case) via reduced HYDE token limits and enhanced context processing with SambaNova Llama 3.1 models.
Supports Python, Rust, JavaScript, and Java codebases.
Utilizes Treesitter for language-agnostic AST parsing.
Integrates LanceDB for vector database storage and retrieval.
Employs OpenAI GPT-4o-mini/GPT-4o for chat and Answerdotai's colbert-small-v1 for reranking.

Maintenance & Community The README does not provide specific details on maintainers, community channels, or project roadmap.

Licensing & Compatibility Licensed under the MIT License, permitting broad use and modification.

Limitations & Caveats The primary branch's performance may differ from the claimed 2.5x speedup achieved in the feature/optimization branch. Performance is dependent on specific LLM configurations and API availability. Requires a local Redis instance and OpenAI API key for core functionality.

code_qa by sankalp1999

Explore Similar Projects

agent-skill by ast-grep

rust-docs-mcp-server by Govcraft

QA-Pilot by reid41

osgrep by Ryandonofrio3

Chat_with_Datawhale_langchain by logan-zou

SeaGOAT by kantord

sage by Storia-AI

ai_code_reader by duma-repo

awesome-cl by CodyReichert

code-graph-rag by vitali87

serena by oraios

gpt-engineer by AntonOsika