RAGQnASystem  by honeyandme

Medical Q&A system using RAG and large language models

created 1 year ago
757 stars

Top 46.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project presents a medical question-answering system leveraging Retrieval-Augmented Generation (RAG) and large language models (LLMs). It targets medical professionals and researchers seeking to improve the reliability of LLMs in healthcare by integrating a knowledge graph (KG) with BERT for Named Entity Recognition (NER) and a 34B LLM for intent recognition. The system aims to provide accurate, KG-enhanced responses, addressing the limitations of standard RAG implementations.

How It Works

The system employs a knowledge graph-based RAG approach, distinguishing itself from typical vector database implementations by using a medical KG built with Neo4j. This KG, derived from the DiseaseKG dataset, is enhanced with LLM-optimized entity information for greater accuracy. NER is performed using a RoBERTa model, benefiting from data augmentation strategies (entity replacement, masking, concatenation) that improved F1 scores to 97.40% on a custom dataset. Intent recognition is handled via prompt engineering, incorporating in-context learning and chain-of-thought, to reduce manual annotation costs while maintaining high accuracy.

Quick Start & Requirements

  • Install: Clone the repository and set up a Conda environment:
    git clone https://github.com/honeyandme/RAGQnASystem.git
    cd RAGQnASystem
    conda create -n RAGQnASystem python=3.10
    conda activate RAGQnASystem
    pip install -r requirements.txt
    
  • Neo4j Setup: Requires Neo4j Community Edition 5.18.1 and JDK 17. Build the KG using python build_up_graph.py --website <YourWebSite> --user <YourUserName> --password <YourPassWord> --dbname <YourDBName>.
  • Run Interface: streamlit run login.py
  • Dependencies: Python 3.10, Neo4j, JDK 17, Huggingface models (e.g., chinese-roberta-wwm-ext).

Highlighted Details

  • Utilizes a medical knowledge graph with ~44.6k entities and ~312k relations for RAG.
  • RoBERTa NER model achieved 97.40% F1 score with data augmentation.
  • Intent recognition uses prompt engineering with LLMs, context learning, and chain-of-thought.
  • Streamlit-based UI includes user/admin login, LLM selection, and multi-window chat.

Maintenance & Community

The project is actively developed, with recent updates including UI enhancements for login, registration, and multi-window chat. Future work includes NL2Cypher for direct query generation. Contact: zeromakers@outlook.com.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is primarily focused on a specific medical domain and KG structure. The NL2Cypher feature, aimed at broader KG utilization, is listed as future work and may require significant development. The lack of an explicit license may pose compatibility concerns for certain use cases.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
145 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.