freshqa  by freshllms

Dataset and code for refreshing LLMs with search

Created 1 year ago
372 stars

Top 76.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the dataset and code for FreshLLMs, a method for refreshing Large Language Models (LLMs) with search engine augmentation. It is relevant for LLM researchers and developers aiming to improve model factuality and up-to-dateness, offering a structured approach to data collection and evaluation.

How It Works

The project centers around the FreshQA dataset, a continuously updated collection of questions and answers designed to evaluate LLM factuality. It also introduces FreshEval, an automatic evaluation metric that leverages few-shot in-context learning with LLMs to assess response quality, aiming to mimic human judgment for factuality.

Quick Start & Requirements

  • FreshQA Dataset: Access via Google Sheets or download as CSV. Weekly updates are provided.
  • FreshEval: Requires Google Colab notebooks, a Google Drive account for data storage, and API access to LLMs (e.g., GPT-4).
  • Dependencies: Python, Google Colab, LLM APIs.

Highlighted Details

  • The FreshQA dataset has inspired or been used in major LLMs like Google Gemini and Perplexity.AI's Online LLMs.
  • FreshEval metric demonstrates high agreement with human raters for evaluating LLM factuality.
  • The project offers both "Relaxed" and "Strict" evaluation modes for FreshEval.
  • Weekly dataset updates are provided, with mechanisms for community contribution.

Maintenance & Community

The project acknowledges several contributors for both dataset updates and original creation. SerpApi is a sponsor, providing search credits for FreshPrompt users.

Licensing & Compatibility

The repository does not explicitly state a license. The provided citation is for an arXiv paper. Commercial use implications are not detailed.

Limitations & Caveats

The FreshEval metric's accuracy is dependent on the chosen LLM and its API access. The README notes that gpt-4-1106-preview is recommended over gpt-4-0125-preview for FreshEval due to slightly better agreement with human annotations in their evaluation.

Health Check
Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

LMaaS-Papers by txsun1997

0%
549
Curated list of LMaaS research papers
Created 3 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

autolabel by refuel-ai

0.1%
2k
Python library to label text datasets using LLMs
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.