RAGOnMedicalKG by liuhuanyong

RAG pipeline for medical Q&A, combining LLMs with a knowledge graph

Created 1 year ago

338 stars

Top 81.6% on SourcePulse

Project Summary

This project provides an open-source solution for Retrieval-Augmented Generation (RAG) in the medical domain, combining Large Language Models (LLMs) with a custom-built medical knowledge graph (KG). It targets developers and researchers interested in building domain-specific QA systems, offering a foundational approach to integrating structured medical knowledge with LLM capabilities for more accurate and context-aware responses.

How It Works

The system constructs a disease-centric medical KG from web data, storing approximately 44,000 entities and 300,000 relationships in Neo4j. For a given query, it first identifies relevant entities within the KG. Then, it uses these entities to recall factual triples from the KG, which are then formatted into a prompt for an LLM (specifically Qwen-7B-Chat). The LLM generates an answer based on the retrieved facts, enabling a RAG-based question-answering service.

Quick Start & Requirements

Install/Run: Requires Neo4j database and Python dependencies. Run python build_medicalgraph.py to import data (can take hours), python qianwen7b_server.py to start the LLM server, and python chat_with_llm.py to query.
Prerequisites: Neo4j database, Python 3.x, Qwen-7B-Chat LLM.
Resources: KG import can take several hours.
Docs: Previous Project (for KG building code/data inheritance).

Highlighted Details

KG Scale: ~44K entities (Diseases, Drugs, Symptoms, etc.) and ~300K relationships (e.g., common_drug, need_check, has_symptom).
RAG Approach: Entity linking, KG fact recall via Cypher queries, and LLM-based answer generation.
Data Source: Vertical medical websites.
Core Idea: Demonstrates a "demo-level" RAG implementation for medical QA.

Maintenance & Community

The project appears to be a personal initiative by liuhuanyong, with no explicit mention of community channels, active development, or partnerships in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "demo-level" and highlights that there is significant room for optimization in entity recognition, subgraph recall, and intent classification. The underlying LLM is specified as Qwen-7B-Chat, which may imply specific hardware or deployment requirements.

RAGOnMedicalKG by liuhuanyong

Explore Similar Projects

A-Guide-to-Retrieval-Augmented-LLM by Wang-Shuo

KG-LLM-MDQA by yuwvandy

stark by snap-stanford

deep-seek by dzhng

Chat_with_Datawhale_langchain by logan-zou

biobert-pretrained by naver

goingmeta by jbarrasa

KG_RAG by BaranziniLab

RAGQnASystem by honeyandme

kg-gen by stair-lab

graph-rag-agent by 1517005260

Yuxi-Know by xerrors