rag-chatbot  by umbertogriffo

RAG chatbot answers questions using context from Markdown files

Created 2 years ago
324 stars

Top 83.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a conversational RAG chatbot that answers questions based on a collection of Markdown files. It's designed for users who want to leverage local, open-source LLMs for document-based Q&A, offering features like conversation memory and multiple response synthesis strategies.

How It Works

The chatbot processes Markdown files by splitting them into chunks, generating embeddings using all-MiniLM-L6-v2, and storing them in a Chroma vector database. When a user asks a question, an LLM first rewrites the query for better retrieval. Relevant document chunks are then fetched from Chroma and used as context to generate an answer with a local LLM via llama-cpp-python. It supports conversation memory and offers three response synthesis strategies: Create and Refine, Hierarchical Summarization, and Async Hierarchical Summarization.

Quick Start & Requirements

  • Install: Use make setup_cuda (for NVIDIA) or make setup_metal (for macOS Metal).
  • Prerequisites: Python 3.10+, Poetry 1.7.0, GPU with CUDA 12.1+ (for setup_cuda).
  • Run Chatbot: streamlit run chatbot/chatbot_app.py -- --model <model_name>
  • Run RAG Chatbot: streamlit run chatbot/rag_chatbot_app.py -- --model <model_name> --k <num_chunks> --synthesis-strategy <strategy>
  • Docs: Llama Cpp Python GitHub Issues

Highlighted Details

  • Leverages llama-cpp-python for efficient local LLM execution with quantization (4-bit precision).
  • Supports various open-source LLMs including Llama 3.1, OpenChat, Starling, Phi-3.5, and StableLM.
  • Implements conversation-aware memory and three context synthesis strategies for handling long contexts.
  • Refactored RecursiveCharacterTextSplitter from LangChain to avoid adding it as a dependency.

Maintenance & Community

  • No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

  • The project does not explicitly state a license in the README.

Limitations & Caveats

  • The README warns that LLMs may generate hallucinations or false information.
  • GPU acceleration on M1 Macs requires using an ARM version of Python; x86 Python will not use the GPU.
Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.