LettuceDetect  by KRLabsOrg

Hallucination detection framework for RAG applications

Created 7 months ago
494 stars

Top 62.6% on SourcePulse

GitHubView on GitHub
Project Summary

LettuceDetect is a hallucination detection framework for Retrieval-Augmented Generation (RAG) systems, designed to identify unsupported parts of an answer by comparing it against provided context. It targets developers and researchers working with RAG, offering a lightweight, efficient, and precise solution to improve the factual accuracy of AI-generated responses.

How It Works

LettuceDetect employs a token-level classification approach, inspired by encoder-based models like Luna and leveraging ModernBERT for extended context processing. This method allows for precise identification of hallucinated spans within an answer. The framework addresses limitations of traditional encoder models by overcoming context window constraints and offers greater computational efficiency compared to LLM-based detection methods.

Quick Start & Requirements

  • Install via pip: pip install lettucedetect or pip install -e . for development.
  • Requires Python.
  • Models are available on Huggingface: KRLabsOrg/lettucedect-base-modernbert-en-v1 and KRLabsOrg/lettucedect-large-modernbert-en-v1.
  • Official quick-start example and demo available in the README.

Highlighted Details

  • Achieves 79.22% F1 score on the RAGTruth dataset with its large model, outperforming GPT-4 and Luna, and competitive with fine-tuned LLAMA-3-8B.
  • Provides token-level precision for identifying exact hallucinated spans.
  • Optimized for inference with smaller model sizes and faster processing.
  • Features a 4K context window capability via ModernBERT.
  • Integrates with Hugging Face Transformers for easy model loading.
  • Includes a Python API and an optional Web API.

Maintenance & Community

  • Developed by KRLabsOrg.
  • MIT-licensed code and models.
  • Citation details provided for academic use.

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

  • While competitive, the large model is noted as "coming up just short" of the SOTA fine-tuned LLAMA-3-8B from the RAG-HAT paper. Training requires downloading the RAGTruth dataset separately.
Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
1
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

HaluEval by RUCAIBox

0.8%
510
Benchmark dataset for LLM hallucination evaluation
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.