RAG-QA-Generator by wangxb96

Automated RAG knowledge base generation and management tool

Created 1 year ago

261 stars

Top 97.3% on SourcePulse

Project Summary

RAG-QA-Generator automates the construction and management of knowledge bases for Retrieval-Augmented Generation (RAG) systems. It addresses the challenges of manual RAG knowledge base creation by using large language models to generate high-quality question-answer pairs from diverse document formats. This tool enhances RAG system development efficiency and accessibility for users, reducing manual intervention and improving knowledge base quality.

How It Works

The project processes documents (txt, pdf, docx) using langchain_community loaders and splits them into manageable text chunks. It then leverages AI, exemplified by the OpenAI API with models like qwen2.5-72b, to generate contextually relevant QA pairs via sophisticated prompts. These generated pairs are managed and stored in a backend database through a RESTful API. A Streamlit-based web interface provides a user-friendly experience for file uploads, QA generation previews, and knowledge base operations.

Quick Start & Requirements

Primary Install/Run:
1. Clone the repository: git clone https://github.com/wangxb96/RAG-QA-Generator.git
2. Navigate to the directory: cd RAG-QA-Generator
3. Install dependencies: pip install -r requirements.txt
4. Configure API keys and base URLs for OpenAI and the backend service.
5. Run the Streamlit application: streamlit run AutoQAG.py
Prerequisites: Python 3.11.5. Key dependencies include streamlit==1.22.0, requests==2.31.0, openai==0.28.0, langchain==0.10.0, PyMuPDF==1.22.5, pandas==2.1.1, langchain_community==0.1.0. Requires API keys for OpenAI and a backend service (e.g., TaskingAI).
Links: GitHub Repository

Highlighted Details

AI-driven generation of diverse question-answer pairs from unstructured documents.
Support for multiple document formats (txt, pdf, docx) via langchain_community loaders.
Intuitive web interface built with Streamlit for user interaction.
Flexible knowledge base management, including collection creation, data insertion, and JSON export/import.

Maintenance & Community

The project acknowledges support from academic institutions in China, including Beijing Normal University's AI and Future Network Center, Smart Interdisciplinary Supercomputing Center, and the Ministry of Education Engineering Research Center for Big Data Cloud-Edge Intelligent Collaboration. No specific community channels (e.g., Discord, Slack) or active contributor information are detailed in the README.

Licensing & Compatibility

The README does not specify a software license. Users should verify licensing terms before adoption, especially concerning commercial use or integration with closed-source systems.

Limitations & Caveats

This project requires the configuration of external API keys for both the language model (e.g., OpenAI) and the backend knowledge base service. Processing large documents or generating extensive QA pairs can be time-consuming. The system's reliance on specific API endpoints and models may necessitate adaptation if those services evolve or change.

RAG-QA-Generator by wangxb96

Explore Similar Projects

zchunk by zeroentropy-ai

RAGchain by Marker-Inc-Korea

rag-all-in-one by lehoanglong95

RAG-Interview-Questions-and-Answers-Hub by KalyanKS-NLP

Awesome-LLM-RAG by jxzhangjhu

HiRAG by hhy-huang

MasteringRAG by Steven-Luo

rag-all-techniques by liu673

RAG-Survey by hymie122

canopy by pinecone-io

all-in-rag by datawhalechina

ragflow by infiniflow