RAGQnASystem by honeyandme

Medical Q&A system using RAG and large language models

Created 1 year ago

1,029 stars

Top 36.4% on SourcePulse

Project Summary

This project presents a medical question-answering system leveraging Retrieval-Augmented Generation (RAG) and large language models (LLMs). It targets medical professionals and researchers seeking to improve the reliability of LLMs in healthcare by integrating a knowledge graph (KG) with BERT for Named Entity Recognition (NER) and a 34B LLM for intent recognition. The system aims to provide accurate, KG-enhanced responses, addressing the limitations of standard RAG implementations.

How It Works

The system employs a knowledge graph-based RAG approach, distinguishing itself from typical vector database implementations by using a medical KG built with Neo4j. This KG, derived from the DiseaseKG dataset, is enhanced with LLM-optimized entity information for greater accuracy. NER is performed using a RoBERTa model, benefiting from data augmentation strategies (entity replacement, masking, concatenation) that improved F1 scores to 97.40% on a custom dataset. Intent recognition is handled via prompt engineering, incorporating in-context learning and chain-of-thought, to reduce manual annotation costs while maintaining high accuracy.

Quick Start & Requirements

Install: Clone the repository and set up a Conda environment:

git clone https://github.com/honeyandme/RAGQnASystem.git
cd RAGQnASystem
conda create -n RAGQnASystem python=3.10
conda activate RAGQnASystem
pip install -r requirements.txt

Neo4j Setup: Requires Neo4j Community Edition 5.18.1 and JDK 17. Build the KG using python build_up_graph.py --website <YourWebSite> --user <YourUserName> --password <YourPassWord> --dbname <YourDBName>.
Run Interface: streamlit run login.py
Dependencies: Python 3.10, Neo4j, JDK 17, Huggingface models (e.g., chinese-roberta-wwm-ext).

Highlighted Details

Utilizes a medical knowledge graph with ~44.6k entities and ~312k relations for RAG.
RoBERTa NER model achieved 97.40% F1 score with data augmentation.
Intent recognition uses prompt engineering with LLMs, context learning, and chain-of-thought.
Streamlit-based UI includes user/admin login, LLM selection, and multi-window chat.

Maintenance & Community

The project is actively developed, with recent updates including UI enhancements for login, registration, and multi-window chat. Future work includes NL2Cypher for direct query generation. Contact: zeromakers@outlook.com.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is primarily focused on a specific medical domain and KG structure. The NL2Cypher feature, aimed at broader KG utilization, is listed as future work and may require significant development. The lack of an explicit license may pose compatibility concerns for certain use cases.

RAGQnASystem by honeyandme

Explore Similar Projects

cyber-doctor by Warma10032

RAGOnMedicalKG by liuhuanyong

fancy-nlp by boat-group

LinkBERT by michiyasunaga

biobert-pretrained by naver

KG_RAG by BaranziniLab

CBLUE by CBLUEbenchmark

DocProduct by re-search

langchain4j-aideepin by moyangzhan

DPR by facebookresearch

OpenChatKit by togethercomputer

llmware by llmware-ai