LongCite  by THUDM

LLM for long-context QA with fine-grained citations

Created 1 year ago
504 stars

Top 61.8% on SourcePulse

GitHubView on GitHub
Project Summary

LongCite addresses the challenge of enabling Large Language Models (LLMs) to generate accurate responses with fine-grained, sentence-level citations for long-context Question Answering (QA). It targets researchers and developers working with LLMs who need to improve the verifiability and trustworthiness of AI-generated content, particularly in scenarios involving extensive source material.

How It Works

LongCite utilizes a "Coarse to Fine" (CoF) pipeline for supervised fine-tuning (SFT) data construction. This pipeline automates the generation of high-quality QA instances with precise citations by first generating coarse-grained answers, then refining them with chunk-level citations, and finally achieving sentence-level citations. This approach allows for the creation of specialized datasets that train LLMs to pinpoint specific sentences within long documents that support their generated answers, enhancing factual accuracy and traceability.

Quick Start & Requirements

  • Install: pip install transformers>=4.43.0
  • Prerequisites: Python 3.x, PyTorch. GPU with CUDA recommended for deployment.
  • Deployment:
    • Run demo: CUDA_VISIBLE_DEVICES=0 streamlit run demo.py --server.fileWatcherType none
    • vLLM inference: See vllm_inference.py for faster generation.
  • Resources: Models support up to 128K context. Specific hardware requirements depend on model size (9B or 8B).
  • Links: Hugging Face Repo, Paper, HF Space

Highlighted Details

  • Open-sources two models: LongCite-glm4-9b and LongCite-llama3.1-8b, supporting up to 128K context.
  • Provides a CoF pipeline for automated SFT data construction with fine-grained citations.
  • Introduces LongBench-Cite, an automatic benchmark for evaluating citation quality and response correctness in long-context QA.
  • Offers a dataset (LongCite-45k) for further model training.

Maintenance & Community

The project is from THUDM, a research group known for LLM advancements. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license for the code or models. The dataset is available via Hugging Face datasets. Compatibility for commercial use is not specified.

Limitations & Caveats

The evaluation benchmark relies on GPT-4o as a judge, requiring an OpenAI API key. The CoF pipeline setup requires API key configuration for data generation.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

long-form-factuality by google-deepmind

0.2%
640
Benchmark for long-form factuality in LLMs
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.