Discover and explore top open-source AI tools and projects—updated daily.
RAG chatbot answers questions using context from Markdown files
Top 83.8% on SourcePulse
This project provides a conversational RAG chatbot that answers questions based on a collection of Markdown files. It's designed for users who want to leverage local, open-source LLMs for document-based Q&A, offering features like conversation memory and multiple response synthesis strategies.
How It Works
The chatbot processes Markdown files by splitting them into chunks, generating embeddings using all-MiniLM-L6-v2
, and storing them in a Chroma vector database. When a user asks a question, an LLM first rewrites the query for better retrieval. Relevant document chunks are then fetched from Chroma and used as context to generate an answer with a local LLM via llama-cpp-python
. It supports conversation memory and offers three response synthesis strategies: Create and Refine, Hierarchical Summarization, and Async Hierarchical Summarization.
Quick Start & Requirements
make setup_cuda
(for NVIDIA) or make setup_metal
(for macOS Metal).setup_cuda
).streamlit run chatbot/chatbot_app.py -- --model <model_name>
streamlit run chatbot/rag_chatbot_app.py -- --model <model_name> --k <num_chunks> --synthesis-strategy <strategy>
Highlighted Details
llama-cpp-python
for efficient local LLM execution with quantization (4-bit precision).RecursiveCharacterTextSplitter
from LangChain to avoid adding it as a dependency.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
1 week