Medical-Graph-RAG by ImprintLab

Graph RAG for medical data research paper

Created 1 year ago

733 stars

Top 47.1% on SourcePulse

Project Summary

This project provides a Graph Retrieval-Augmented Generation (RAG) system tailored for the medical domain, aiming to enhance the safety and accuracy of medical large language models. It is designed for researchers and developers working with medical data who need to integrate structured knowledge graphs with LLM-based question-answering.

How It Works

The system employs a multi-level RAG approach, leveraging a knowledge graph constructed from medical data. It integrates data from various sources: private user data (like MIMIC IV), curated papers and books (MedC-K, S2ORC), and structured dictionary data (UMLS). The core innovation lies in its hierarchical graph linking, enabling more contextually relevant retrieval for LLM inference. This approach aims to provide safer and more grounded medical information by grounding responses in structured medical knowledge.

Quick Start & Requirements

Install: conda env create -f medgraphrag.yml
Prerequisites: OpenAI API Key, NCBI API Key (for demo), Neo4j instance, Python 3.x. Requires access to datasets like MIMIC IV, MedC-K, and UMLS, which may involve separate application processes or licensing.
Demo: Docker image available (jundewu/medrag-post) for web-based PubMed searches.
Links: Paper: https://arxiv.org/abs/2408.04187

Highlighted Details

Hierarchical graph linking using UMLS and custom graph construction for MedC-K.
Supports multiple data levels: private, medium (papers/books), and bottom (dictionary).
Built upon the CAMEL framework for multi-agent pipelines.
Offers a baseline RAG pipeline and a complete Graph RAG flow as described in the paper.

Maintenance & Community

The project is associated with authors Junde Wu and Jiayuan Zhu. Further community engagement channels are not explicitly listed in the README.

Licensing & Compatibility

The README does not explicitly state a license. The use of datasets like MIMIC IV, MedC-K, and UMLS may be subject to their respective licenses and usage agreements, potentially restricting commercial use or redistribution of raw data.

Limitations & Caveats

Accessing and processing the full dataset hierarchy (MIMIC IV, MedC-K, UMLS) can be challenging due to data access requirements and licensing. The project is actively working on providing simpler example datasets to ease implementation.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

27 stars in the last 30 days