Research paper code for context generation using LLMs
Top 92.3% on sourcepulse
This repository provides the official implementation for the "Generate rather than Retrieve" (GenRead) paper, which explores using large language models (LLMs) as strong context generators for question answering. It targets researchers and practitioners in NLP and LLM applications, offering a novel approach to knowledge retrieval by generating relevant context instead of traditional retrieval methods.
How It Works
GenRead frames question answering as a context generation task. Instead of retrieving existing documents, it leverages LLMs to generate relevant background documents that contain the answer. The process involves two main steps: generating candidate documents using an LLM (either zero-shot or supervised with sampling/clustering) and then inferring the answer from these generated documents. This approach aims to overcome limitations of traditional retrieval systems by creating context tailored to the query.
Quick Start & Requirements
pip install openai
inference.py
), Python 3.x. Datasets (NQ, TriviaQA, WebQ, FM2, FEVER, Wizard) need to be downloaded and placed in the indataset
folder.Highlighted Details
text-davinci-002
with greedy search for reproducibility.Maintenance & Community
The project is associated with ICLR 2023. Contact information for checkpoint requests is provided (wyu1@nd.edu).
Licensing & Compatibility
The repository does not explicitly state a license. The use of OpenAI's API implies adherence to their terms of service. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project relies heavily on the OpenAI API, which incurs costs and is subject to API availability. Supervised generation methods may produce non-deterministic outputs. The Fusion-in-Decoder models are provided as separate checkpoints requiring separate download and integration.
2 years ago
1 day