ChatKBQA  by LHRLAB

Research paper resources for knowledge base question answering

created 1 year ago
313 stars

Top 87.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official resources for ChatKBQA, a generate-then-retrieve framework for Knowledge Base Question Answering (KBQA) using fine-tuned Large Language Models (LLMs). It addresses challenges in knowledge retrieval and semantic parsing for KBQA, offering a novel approach that first generates a logical form with LLMs and then refines it through unsupervised retrieval. The framework is designed for researchers and practitioners in NLP and KG-based QA.

How It Works

ChatKBQA employs a two-stage process: first, it fine-tunes LLMs (like Llama2) to generate logical forms (S-expressions) from natural language questions. Second, it uses an unsupervised retrieval method to identify and replace entities and relations within the generated logical form, improving accuracy and interpretability. This generate-then-retrieve strategy aims to directly enhance both generation and retrieval components, simplifying previous complex KBQA methods.

Quick Start & Requirements

  • Environment Setup: conda create -n chatkbqa python=3.8, conda activate chatkbqa, pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117, pip install -r requirement.txt.
  • Prerequisites: Python 3.8, PyTorch 1.13.1 with CUDA 11.7, significant disk space (53GB+ for Freebase KG, 100GB+ RAM recommended for Virtuoso backend).
  • Data: Requires downloading Freebase KG data (53GB+), FACC1 mentions, and datasets (WebQSP, CWQ). Processed data and checkpoints are available via provided links.
  • Documentation: ACL 2024 paper.

Highlighted Details

  • Achieves state-of-the-art performance on WebQSP and CWQ benchmarks.
  • Utilizes LoRA for efficient LLM fine-tuning.
  • Supports entity and relation retrieval for KBQA refinement.
  • Provides scripts for data processing, LLM fine-tuning, and evaluation.

Maintenance & Community

The project is associated with the ACL 2024 conference. Contact email: luohaoran@bupt.edu.cn. The repository acknowledges contributions from PEFT, LLaMA-Efficient-Tuning, SimCSE, GMT-KBQA, and DECAF.

Licensing & Compatibility

The repository does not explicitly state a license. The ACL 2024 paper is published by the Association for Computational Linguistics. Commercial use and linking with closed-source projects would require clarification on licensing terms.

Limitations & Caveats

The setup is complex, requiring substantial disk space and specific CUDA versions. The reliance on a Virtuoso backend for Freebase KG adds an operational overhead. The project is presented as official resources for a research paper, implying it may be primarily for academic use and experimentation.

Health Check
Last commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.