Medical Q&A system using RAG and large language models
Top 46.9% on sourcepulse
This project presents a medical question-answering system leveraging Retrieval-Augmented Generation (RAG) and large language models (LLMs). It targets medical professionals and researchers seeking to improve the reliability of LLMs in healthcare by integrating a knowledge graph (KG) with BERT for Named Entity Recognition (NER) and a 34B LLM for intent recognition. The system aims to provide accurate, KG-enhanced responses, addressing the limitations of standard RAG implementations.
How It Works
The system employs a knowledge graph-based RAG approach, distinguishing itself from typical vector database implementations by using a medical KG built with Neo4j. This KG, derived from the DiseaseKG dataset, is enhanced with LLM-optimized entity information for greater accuracy. NER is performed using a RoBERTa model, benefiting from data augmentation strategies (entity replacement, masking, concatenation) that improved F1 scores to 97.40% on a custom dataset. Intent recognition is handled via prompt engineering, incorporating in-context learning and chain-of-thought, to reduce manual annotation costs while maintaining high accuracy.
Quick Start & Requirements
git clone https://github.com/honeyandme/RAGQnASystem.git
cd RAGQnASystem
conda create -n RAGQnASystem python=3.10
conda activate RAGQnASystem
pip install -r requirements.txt
python build_up_graph.py --website <YourWebSite> --user <YourUserName> --password <YourPassWord> --dbname <YourDBName>
.streamlit run login.py
Highlighted Details
Maintenance & Community
The project is actively developed, with recent updates including UI enhancements for login, registration, and multi-window chat. Future work includes NL2Cypher for direct query generation. Contact: zeromakers@outlook.com.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is primarily focused on a specific medical domain and KG structure. The NL2Cypher feature, aimed at broader KG utilization, is listed as future work and may require significant development. The lack of an explicit license may pose compatibility concerns for certain use cases.
1 year ago
Inactive