LongCite  by THUDM

LLM for long-context QA with fine-grained citations

created 11 months ago
503 stars

Top 62.7% on sourcepulse

GitHubView on GitHub
Project Summary

LongCite addresses the challenge of enabling Large Language Models (LLMs) to generate accurate responses with fine-grained, sentence-level citations for long-context Question Answering (QA). It targets researchers and developers working with LLMs who need to improve the verifiability and trustworthiness of AI-generated content, particularly in scenarios involving extensive source material.

How It Works

LongCite utilizes a "Coarse to Fine" (CoF) pipeline for supervised fine-tuning (SFT) data construction. This pipeline automates the generation of high-quality QA instances with precise citations by first generating coarse-grained answers, then refining them with chunk-level citations, and finally achieving sentence-level citations. This approach allows for the creation of specialized datasets that train LLMs to pinpoint specific sentences within long documents that support their generated answers, enhancing factual accuracy and traceability.

Quick Start & Requirements

  • Install: pip install transformers>=4.43.0
  • Prerequisites: Python 3.x, PyTorch. GPU with CUDA recommended for deployment.
  • Deployment:
    • Run demo: CUDA_VISIBLE_DEVICES=0 streamlit run demo.py --server.fileWatcherType none
    • vLLM inference: See vllm_inference.py for faster generation.
  • Resources: Models support up to 128K context. Specific hardware requirements depend on model size (9B or 8B).
  • Links: Hugging Face Repo, Paper, HF Space

Highlighted Details

  • Open-sources two models: LongCite-glm4-9b and LongCite-llama3.1-8b, supporting up to 128K context.
  • Provides a CoF pipeline for automated SFT data construction with fine-grained citations.
  • Introduces LongBench-Cite, an automatic benchmark for evaluating citation quality and response correctness in long-context QA.
  • Offers a dataset (LongCite-45k) for further model training.

Maintenance & Community

The project is from THUDM, a research group known for LLM advancements. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license for the code or models. The dataset is available via Hugging Face datasets. Compatibility for commercial use is not specified.

Limitations & Caveats

The evaluation benchmark relies on GPT-4o as a judge, requiring an OpenAI API key. The CoF pipeline setup requires API key configuration for data generation.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.