repochat  by pnkvalavala

CLI tool for LLM-powered GitHub repository Q&A via RAG

created 2 years ago
309 stars

Top 88.0% on sourcepulse

GitHubView on GitHub
Project Summary

Repochat is a Python-based chatbot assistant that enables interactive conversations about GitHub repositories using Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). It's designed for developers and researchers who need to quickly query and understand the contents of codebases without extensive manual code review.

How It Works

Repochat leverages a RAG architecture. Upon receiving a GitHub repository URL, it clones the repository, splits its files into manageable chunks, and generates embeddings using the sentence-transformers/all-mpnet-base-v2 model. These embeddings are stored in a local ChromaDB vector database. When a user asks a question, Repochat retrieves relevant document chunks from the vector database and passes them, along with the user's query, to a local LLM (e.g., CodeLlama) for response generation. This approach allows for contextually relevant answers based on the repository's code.

Quick Start & Requirements

  • Install: Clone the repository, create a virtual environment, and install dependencies via pip install -r requirements.txt. Install llama-cpp-python with hardware acceleration (e.g., CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python).
  • Prerequisites: Python 3.x, Git, a Hugging Face model (e.g., TheBloke/CodeLlama-7B-GGUF) placed in a models folder, and potentially GPU with CUDA/ROCm/Metal for accelerated inference.
  • Setup: Download a model and configure models.py with the model path.
  • Run: streamlit run app.py
  • Docs: llama-cpp-python

Highlighted Details

  • Supports local execution without external API calls (main branch).
  • Offers hardware acceleration options for llama-cpp-python (cuBLAS, OpenBLAS, CLBlast, Metal, hipBLAS).
  • Utilizes ChromaDB for local vector storage.
  • Retains conversation memory for contextual responses.

Maintenance & Community

The project is maintained by pnkvalavala. Further community engagement details (Discord, Slack, roadmap) are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The README mentions a change from a previous license, advising users to review the current Apache 2.0 terms. The performance and quality of responses are dependent on the chosen LLM and its quantization.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.