KG_RAG  by BaranziniLab

KG-RAG empowers LLMs using knowledge graphs for knowledge-intensive tasks

Created 1 year ago
885 stars

Top 40.9% on SourcePulse

GitHubView on GitHub
Project Summary

This framework empowers Large Language Models (LLMs) for knowledge-intensive tasks by integrating explicit knowledge from a biomedical Knowledge Graph (KG) with the implicit knowledge of LLMs, offering "prompt-aware context" for improved accuracy. It is designed for researchers and developers working with biomedical data and LLMs.

How It Works

KG-RAG combines a massive biomedical KG (SPOKE, with 27M nodes and 53M edges) with LLMs like GPT and Llama. It extracts "prompt-aware context"—the minimal, relevant information from the KG needed to answer a user's query. This approach optimizes domain-specific context for general-purpose LLMs, enhancing their performance on knowledge-intensive tasks.

Quick Start & Requirements

  • Install: Clone the repository, create a Python 3.10.9 virtual environment, and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Python 3.10.9, config.yaml must be updated. Optional Llama model download.
  • Setup: Run python -m kg_rag.run_setup to create a disease vector database and optionally download the Llama model.
  • Run: Execute KG-RAG via terminal commands, e.g., GPT_API_TYPE='openai' python -m kg_rag.rag_based_generation.GPT.text_generation -g for GPT or python -m kg_rag.rag_based_generation.Llama.text_generation -m <method> for Llama. Interactive modes are available (-i True).
  • Docs: arXiv preprint, BiomixQA Dataset.

Highlighted Details

  • Utilizes the SPOKE biomedical KG, integrating over 40 repositories and 27 million nodes.
  • Supports both GPT (Azure/OpenAI) and Llama models.
  • Includes the BiomixQA benchmark dataset for evaluating KG-RAG performance.
  • Offers interactive modes for step-by-step execution and evidence display (-e).

Maintenance & Community

The project is associated with BaranziniLab. The primary citation is Soman et al., 2023.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently designed primarily for disease-related prompts, with ongoing work to improve versatility. The setup script may download large models if not already present.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.