KG_RAG  by BaranziniLab

KG-RAG empowers LLMs using knowledge graphs for knowledge-intensive tasks

created 1 year ago
870 stars

Top 42.2% on sourcepulse

GitHubView on GitHub
Project Summary

This framework empowers Large Language Models (LLMs) for knowledge-intensive tasks by integrating explicit knowledge from a biomedical Knowledge Graph (KG) with the implicit knowledge of LLMs, offering "prompt-aware context" for improved accuracy. It is designed for researchers and developers working with biomedical data and LLMs.

How It Works

KG-RAG combines a massive biomedical KG (SPOKE, with 27M nodes and 53M edges) with LLMs like GPT and Llama. It extracts "prompt-aware context"—the minimal, relevant information from the KG needed to answer a user's query. This approach optimizes domain-specific context for general-purpose LLMs, enhancing their performance on knowledge-intensive tasks.

Quick Start & Requirements

  • Install: Clone the repository, create a Python 3.10.9 virtual environment, and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Python 3.10.9, config.yaml must be updated. Optional Llama model download.
  • Setup: Run python -m kg_rag.run_setup to create a disease vector database and optionally download the Llama model.
  • Run: Execute KG-RAG via terminal commands, e.g., GPT_API_TYPE='openai' python -m kg_rag.rag_based_generation.GPT.text_generation -g for GPT or python -m kg_rag.rag_based_generation.Llama.text_generation -m <method> for Llama. Interactive modes are available (-i True).
  • Docs: arXiv preprint, BiomixQA Dataset.

Highlighted Details

  • Utilizes the SPOKE biomedical KG, integrating over 40 repositories and 27 million nodes.
  • Supports both GPT (Azure/OpenAI) and Llama models.
  • Includes the BiomixQA benchmark dataset for evaluating KG-RAG performance.
  • Offers interactive modes for step-by-step execution and evidence display (-e).

Maintenance & Community

The project is associated with BaranziniLab. The primary citation is Soman et al., 2023.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently designed primarily for disease-related prompts, with ongoing work to improve versatility. The setup script may download large models if not already present.

Health Check
Last commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
46 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.