LongCite by THUDM

LLM for long-context QA with fine-grained citations

Created 1 year ago

516 stars

Top 60.7% on SourcePulse

Project Summary

LongCite addresses the challenge of enabling Large Language Models (LLMs) to generate accurate responses with fine-grained, sentence-level citations for long-context Question Answering (QA). It targets researchers and developers working with LLMs who need to improve the verifiability and trustworthiness of AI-generated content, particularly in scenarios involving extensive source material.

How It Works

LongCite utilizes a "Coarse to Fine" (CoF) pipeline for supervised fine-tuning (SFT) data construction. This pipeline automates the generation of high-quality QA instances with precise citations by first generating coarse-grained answers, then refining them with chunk-level citations, and finally achieving sentence-level citations. This approach allows for the creation of specialized datasets that train LLMs to pinpoint specific sentences within long documents that support their generated answers, enhancing factual accuracy and traceability.

Quick Start & Requirements

Install: pip install transformers>=4.43.0
Prerequisites: Python 3.x, PyTorch. GPU with CUDA recommended for deployment.
Deployment:
- Run demo: CUDA_VISIBLE_DEVICES=0 streamlit run demo.py --server.fileWatcherType none
- vLLM inference: See vllm_inference.py for faster generation.
Resources: Models support up to 128K context. Specific hardware requirements depend on model size (9B or 8B).
Links: Hugging Face Repo, Paper, HF Space

Highlighted Details

Open-sources two models: LongCite-glm4-9b and LongCite-llama3.1-8b, supporting up to 128K context.
Provides a CoF pipeline for automated SFT data construction with fine-grained citations.
Introduces LongBench-Cite, an automatic benchmark for evaluating citation quality and response correctness in long-context QA.
Offers a dataset (LongCite-45k) for further model training.

Maintenance & Community

The project is from THUDM, a research group known for LLM advancements. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license for the code or models. The dataset is available via Hugging Face datasets. Compatibility for commercial use is not specified.

Limitations & Caveats

The evaluation benchmark relies on GPT-4o as a judge, requiring an OpenAI API key. The CoF pipeline setup requires API key configuration for data generation.

LongCite by THUDM

Explore Similar Projects

LEval by OpenLMLab

context-cite by MadryLab

ChatKBQA by LHRLAB

KG-LLM-MDQA by yuwvandy

AutoSurvey by AutoSurveys

ALCE by princeton-nlp

deep-seek by dzhng

primeqa by primeqa

LlamaAcademy by danielgross

long-form-factuality by google-deepmind

WikiChat by stanford-oval

DPR by facebookresearch