LLM for long-context QA with fine-grained citations
Top 62.7% on sourcepulse
LongCite addresses the challenge of enabling Large Language Models (LLMs) to generate accurate responses with fine-grained, sentence-level citations for long-context Question Answering (QA). It targets researchers and developers working with LLMs who need to improve the verifiability and trustworthiness of AI-generated content, particularly in scenarios involving extensive source material.
How It Works
LongCite utilizes a "Coarse to Fine" (CoF) pipeline for supervised fine-tuning (SFT) data construction. This pipeline automates the generation of high-quality QA instances with precise citations by first generating coarse-grained answers, then refining them with chunk-level citations, and finally achieving sentence-level citations. This approach allows for the creation of specialized datasets that train LLMs to pinpoint specific sentences within long documents that support their generated answers, enhancing factual accuracy and traceability.
Quick Start & Requirements
pip install transformers>=4.43.0
CUDA_VISIBLE_DEVICES=0 streamlit run demo.py --server.fileWatcherType none
vllm_inference.py
for faster generation.Highlighted Details
LongCite-glm4-9b
and LongCite-llama3.1-8b
, supporting up to 128K context.LongBench-Cite
, an automatic benchmark for evaluating citation quality and response correctness in long-context QA.LongCite-45k
) for further model training.Maintenance & Community
The project is from THUDM, a research group known for LLM advancements. Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The README does not explicitly state the license for the code or models. The dataset is available via Hugging Face datasets. Compatibility for commercial use is not specified.
Limitations & Caveats
The evaluation benchmark relies on GPT-4o as a judge, requiring an OpenAI API key. The CoF pipeline setup requires API key configuration for data generation.
7 months ago
1 day