RAGOnMedicalKG  by liuhuanyong

RAG pipeline for medical Q&A, combining LLMs with a knowledge graph

created 1 year ago
307 stars

Top 88.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an open-source solution for Retrieval-Augmented Generation (RAG) in the medical domain, combining Large Language Models (LLMs) with a custom-built medical knowledge graph (KG). It targets developers and researchers interested in building domain-specific QA systems, offering a foundational approach to integrating structured medical knowledge with LLM capabilities for more accurate and context-aware responses.

How It Works

The system constructs a disease-centric medical KG from web data, storing approximately 44,000 entities and 300,000 relationships in Neo4j. For a given query, it first identifies relevant entities within the KG. Then, it uses these entities to recall factual triples from the KG, which are then formatted into a prompt for an LLM (specifically Qwen-7B-Chat). The LLM generates an answer based on the retrieved facts, enabling a RAG-based question-answering service.

Quick Start & Requirements

  • Install/Run: Requires Neo4j database and Python dependencies. Run python build_medicalgraph.py to import data (can take hours), python qianwen7b_server.py to start the LLM server, and python chat_with_llm.py to query.
  • Prerequisites: Neo4j database, Python 3.x, Qwen-7B-Chat LLM.
  • Resources: KG import can take several hours.
  • Docs: Previous Project (for KG building code/data inheritance).

Highlighted Details

  • KG Scale: ~44K entities (Diseases, Drugs, Symptoms, etc.) and ~300K relationships (e.g., common_drug, need_check, has_symptom).
  • RAG Approach: Entity linking, KG fact recall via Cypher queries, and LLM-based answer generation.
  • Data Source: Vertical medical websites.
  • Core Idea: Demonstrates a "demo-level" RAG implementation for medical QA.

Maintenance & Community

The project appears to be a personal initiative by liuhuanyong, with no explicit mention of community channels, active development, or partnerships in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "demo-level" and highlights that there is significant room for optimization in entity recognition, subgraph recall, and intent classification. The underlying LLM is specified as Qwen-7B-Chat, which may imply specific hardware or deployment requirements.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.