Chat_with_Datawhale_langchain by logan-zou

RAG for personal knowledge base Q&A

Created 1 year ago

409 stars

Top 71.3% on SourcePulse

Project Summary

This project provides a personal knowledge base assistant that leverages large language models (LLMs) to answer questions based on provided documentation. It's designed for users who need to efficiently query and retrieve information from extensive, complex datasets, offering a streamlined approach to knowledge management and access.

How It Works

The core of the project is a Retrieval-Augmented Generation (RAG) pipeline built with Langchain. It ingests various document formats (PDF, Markdown, TXT), splits them into manageable chunks, and generates vector embeddings using models like m3e or OpenAI. These embeddings are stored in a Chroma vector database for efficient similarity search. When a user asks a question, the system vectorizes the query, retrieves the most relevant document chunks from the database, and feeds them as context to an LLM (supporting OpenAI, Ernie Bot, Spark, and ChatGLM) to generate a concise answer.

Quick Start & Requirements

Installation: Clone the repository, create a Conda environment (conda create -n llm-universe python==3.9.0), activate it (conda activate llm-universe), and install dependencies (pip install -r requirements.txt).
Prerequisites: Python >= 3.9, PyTorch >= 2.0.0. CPU: Intel 5th gen or equivalent (2+ core cloud CPU recommended). RAM: 4GB minimum.
Running:
- Local API: cd project/serve then uvicorn api:app --reload (Linux) or python api.py (Windows).
- Gradio Demo: cd llm-universe/project/serve then python run_gradio.py -model_name='chatglm_std' -embedding_model='m3e' -db_path='../../data_base/knowledge_db' -persist_path='../../data_base/vector_db'
Documentation: Project Repository

Highlighted Details

Supports multiple LLM APIs (OpenAI, Ernie Bot, Spark, ChatGLM) and embedding models (m3e, OpenAI, ZhipuAI).
Includes scripts to automatically fetch and summarize READMEs from the Datawhale GitHub organization.
Handles various document types (PDF, Markdown) using loaders like PyMuPDFLoader and UnstructuredMarkdownLoader.
Implements both stateless and stateful (conversational memory) QA chains.

Maintenance & Community

Current version: 0.2.0 (updated March 17, 2024).
Future plans include user-uploaded knowledge bases, Multi-Agent frameworks, and improved retrieval accuracy.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README mentions potential rate limiting or "wind control" issues when using OpenAI for summarization, requiring 60-second delays.
Handling complex document structures (charts, images) within PDFs requires custom fine-tuning.
The project relies on external API keys for many LLM and embedding services.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

12 stars in the last 30 days

Explore Similar Projects

yacy_expert by yacy

Search portal using LLMs and RAG for comprehensive question answering

Created 15 years ago

Updated 11 months ago

Starred by

Deshraj Yadav

Deshraj Yadav(Cofounder of Mem0) and

Taranjeet Singh

Taranjeet Singh(Cofounder of Mem0).

embedchainjs by mem0ai

JavaScript framework for LLM-powered bots over any dataset

Created 2 years ago

Updated 2 years ago

Easy-RAG by yuntianhe2014

RAG system for learning, usage, and extension

Created 1 year ago

Updated 1 year ago

DataChad by gustavz

App for Q&A using LLMs and vector DBs

Created 2 years ago

Updated 1 year ago

ChatPDF by shibing624

RAG for local LLM, enables chat with PDF/docs

Created 2 years ago

Updated 9 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

raptor by parthsarthi03

Retrieval-augmented language model research paper

Created 1 year ago

Updated 1 year ago

rag-web-ui by rag-web-ui

RAG system for building intelligent Q&A over a knowledge base

Created 1 year ago

Updated 1 month ago

Chinese-LangChain by yanqiangmiffy

Gradio SDK for local knowledge base QA using ChatGLM-6B + LangChain

Created 2 years ago

Updated 2 years ago

Yuxi-Know by xerrors

RAG knowledge base and knowledge graph QA system

Created 1 year ago

Updated 2 days ago

LangChain-ChatGLM-Webui by X-D-Lab

WebUI for local knowledge-based Q\&A using LangChain and ChatGLM

Created 2 years ago

Updated 1 year ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Dan Guido

Dan Guido(Cofounder of Trail of Bits), and

8 more.

Verba by weaviate

RAG chatbot for querying data, locally or via cloud deployment

Created 2 years ago

Updated 6 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

2 more.

LightRAG by HKUDS

RAG framework for fast, simple retrieval-augmented generation

Created 1 year ago

Updated 3 days ago

Feedback? Help us improve.