RAG pipeline for medical Q&A, combining LLMs with a knowledge graph
Top 88.4% on sourcepulse
This project provides an open-source solution for Retrieval-Augmented Generation (RAG) in the medical domain, combining Large Language Models (LLMs) with a custom-built medical knowledge graph (KG). It targets developers and researchers interested in building domain-specific QA systems, offering a foundational approach to integrating structured medical knowledge with LLM capabilities for more accurate and context-aware responses.
How It Works
The system constructs a disease-centric medical KG from web data, storing approximately 44,000 entities and 300,000 relationships in Neo4j. For a given query, it first identifies relevant entities within the KG. Then, it uses these entities to recall factual triples from the KG, which are then formatted into a prompt for an LLM (specifically Qwen-7B-Chat). The LLM generates an answer based on the retrieved facts, enabling a RAG-based question-answering service.
Quick Start & Requirements
python build_medicalgraph.py
to import data (can take hours), python qianwen7b_server.py
to start the LLM server, and python chat_with_llm.py
to query.Highlighted Details
common_drug
, need_check
, has_symptom
).Maintenance & Community
The project appears to be a personal initiative by liuhuanyong, with no explicit mention of community channels, active development, or partnerships in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is described as "demo-level" and highlights that there is significant room for optimization in entity recognition, subgraph recall, and intent classification. The underlying LLM is specified as Qwen-7B-Chat, which may imply specific hardware or deployment requirements.
1 year ago
Inactive