KG-LLM pipeline for multi-document question answering
Top 88.2% on sourcepulse
This repository provides code and a demo for Knowledge Graph Prompting (KGP) applied to Multi-Document Question Answering (MDQA). It enables efficient and accurate answers from multiple text sources by leveraging knowledge graphs and large language models, targeting researchers and practitioners in NLP and information retrieval.
How It Works
The project implements a pipeline that first collects and processes documents relevant to question-answering datasets. It then constructs knowledge graphs from these documents using methods like TF-IDF, KNN, and TAGME. Dense Passage Retrieval (DPR) and Multi-hop Dense Retrieval (MDR) models are fine-tuned for passage retrieval. Finally, it integrates these components with instruction-tuned LLaMA or T5 models for intelligent graph traversal and answer generation, aiming to improve QA performance through structured knowledge.
Quick Start & Requirements
conda install -c anaconda python=3.8
, then pip install -r requirements.txt
and other listed packages.torch-scatter
, Levenshtein
), spacy
with en_core_web_lg
model, openai==0.28
, langchain
, nltk
, rank_bm25
, sentence-transformers
, sentencepiece
, transformers
.Highlighted Details
Maintenance & Community
No specific community links (Discord/Slack) or details on maintainers/sponsorships are provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. The use of openai==0.28
suggests potential compatibility considerations with newer OpenAI API versions.
Limitations & Caveats
The project relies on specific versions of dependencies (e.g., openai==0.28
, CUDA 11.8) which may require careful environment management. The README notes that parallel LLM API calls can incur significant costs, advising users to adjust CPU usage based on their budget. Access to all datasets and model checkpoints is via Dropbox links.
5 months ago
1 day