ChatKBQA by LHRLAB

Research paper resources for knowledge base question answering

Created 2 years ago

334 stars

Top 82.2% on SourcePulse

Project Summary

This repository provides the official resources for ChatKBQA, a generate-then-retrieve framework for Knowledge Base Question Answering (KBQA) using fine-tuned Large Language Models (LLMs). It addresses challenges in knowledge retrieval and semantic parsing for KBQA, offering a novel approach that first generates a logical form with LLMs and then refines it through unsupervised retrieval. The framework is designed for researchers and practitioners in NLP and KG-based QA.

How It Works

ChatKBQA employs a two-stage process: first, it fine-tunes LLMs (like Llama2) to generate logical forms (S-expressions) from natural language questions. Second, it uses an unsupervised retrieval method to identify and replace entities and relations within the generated logical form, improving accuracy and interpretability. This generate-then-retrieve strategy aims to directly enhance both generation and retrieval components, simplifying previous complex KBQA methods.

Quick Start & Requirements

Environment Setup: conda create -n chatkbqa python=3.8, conda activate chatkbqa, pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117, pip install -r requirement.txt.
Prerequisites: Python 3.8, PyTorch 1.13.1 with CUDA 11.7, significant disk space (53GB+ for Freebase KG, 100GB+ RAM recommended for Virtuoso backend).
Data: Requires downloading Freebase KG data (53GB+), FACC1 mentions, and datasets (WebQSP, CWQ). Processed data and checkpoints are available via provided links.
Documentation: ACL 2024 paper.

Highlighted Details

Achieves state-of-the-art performance on WebQSP and CWQ benchmarks.
Utilizes LoRA for efficient LLM fine-tuning.
Supports entity and relation retrieval for KBQA refinement.
Provides scripts for data processing, LLM fine-tuning, and evaluation.

Maintenance & Community

The project is associated with the ACL 2024 conference. Contact email: luohaoran@bupt.edu.cn. The repository acknowledges contributions from PEFT, LLaMA-Efficient-Tuning, SimCSE, GMT-KBQA, and DECAF.

Licensing & Compatibility

The repository does not explicitly state a license. The ACL 2024 paper is published by the Association for Computational Linguistics. Commercial use and linking with closed-source projects would require clarification on licensing terms.

Limitations & Caveats

The setup is complex, requiring substantial disk space and specific CUDA versions. The reliance on a Virtuoso backend for Freebase KG adds an operational overhead. The project is presented as official resources for a research paper, implying it may be primarily for academic use and experimentation.

ChatKBQA by LHRLAB

Explore Similar Projects

in-context-ralm by AI21Labs

KG-LLM-MDQA by yuwvandy

LLM4IR-Survey by RUC-NLPIR

stark by snap-stanford

deep-seek by dzhng

rag-demystified by pchunduri6

primeqa by primeqa

TAG-Bench by TAG-Research

acl2020-openqa-tutorial by danqi

raptor by parthsarthi03

Huatuo-Llama-Med-Chinese by SCIR-HI

easy-dataset by ConardLi