KG_RAG by BaranziniLab

KG-RAG empowers LLMs using knowledge graphs for knowledge-intensive tasks

Created 2 years ago

930 stars

Top 39.3% on SourcePulse

Project Summary

This framework empowers Large Language Models (LLMs) for knowledge-intensive tasks by integrating explicit knowledge from a biomedical Knowledge Graph (KG) with the implicit knowledge of LLMs, offering "prompt-aware context" for improved accuracy. It is designed for researchers and developers working with biomedical data and LLMs.

How It Works

KG-RAG combines a massive biomedical KG (SPOKE, with 27M nodes and 53M edges) with LLMs like GPT and Llama. It extracts "prompt-aware context"—the minimal, relevant information from the KG needed to answer a user's query. This approach optimizes domain-specific context for general-purpose LLMs, enhancing their performance on knowledge-intensive tasks.

Quick Start & Requirements

Install: Clone the repository, create a Python 3.10.9 virtual environment, and install dependencies via pip install -r requirements.txt.
Prerequisites: Python 3.10.9, config.yaml must be updated. Optional Llama model download.
Setup: Run python -m kg_rag.run_setup to create a disease vector database and optionally download the Llama model.
Run: Execute KG-RAG via terminal commands, e.g., GPT_API_TYPE='openai' python -m kg_rag.rag_based_generation.GPT.text_generation -g for GPT or python -m kg_rag.rag_based_generation.Llama.text_generation -m <method> for Llama. Interactive modes are available (-i True).
Docs: arXiv preprint, BiomixQA Dataset.

Highlighted Details

Utilizes the SPOKE biomedical KG, integrating over 40 repositories and 27 million nodes.
Supports both GPT (Azure/OpenAI) and Llama models.
Includes the BiomixQA benchmark dataset for evaluating KG-RAG performance.
Offers interactive modes for step-by-step execution and evidence display (-e).

Maintenance & Community

The project is associated with BaranziniLab. The primary citation is Soman et al., 2023.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently designed primarily for disease-related prompts, with ongoing work to improve versatility. The setup script may download large models if not already present.

KG_RAG by BaranziniLab

Explore Similar Projects

ITFormer-ICML25 by Pandalin98

RAGOnMedicalKG by liuhuanyong

visual-med-alpaca by cambridgeltl

OpenGPT by CogStack

stark by snap-stanford

DISC-MedLLM by FudanDISC

UnifiedSKG by xlang-ai

medAlpaca by kbressem

biobert-pretrained by naver

RAGQnASystem by honeyandme

PIKE-RAG by microsoft

ChatDoctor by Kent0n-Li