RankGPT  by sunnweiwei

Re-ranking agent using LLMs for information retrieval research

created 2 years ago
624 stars

Top 53.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides code and methodologies for using Large Language Models (LLMs) as re-ranking agents in Information Retrieval (IR). It targets researchers and practitioners in NLP and IR seeking to leverage LLMs for improved search relevance, offering a novel approach to adapt LLMs for this task and distill their capabilities into smaller, specialized models.

How It Works

The core approach involves using LLMs to generate permutations of search results based on a given query. This is achieved by crafting specific instructions for the LLM, prompting it to rank a set of candidate documents. A "sliding window" strategy is employed to overcome LLM token limits, allowing for the re-ranking of more documents by processing them in overlapping chunks. The project also details a method for "Instruction Distillation," where LLM-generated rankings are used to train smaller, specialized neural models (e.g., DeBERTa) for efficient zero-shot ranking.

Quick Start & Requirements

  • Install: pip install rank_gpt
  • Prerequisites: Python 3.8+, OpenAI API Key (for GPT models), pyserini for retrieval benchmarks. CUDA is recommended for training specialized models.
  • Setup: Basic usage requires an API key and minimal Python setup. Benchmark evaluation involves downloading datasets and pre-built indices.
  • Links:

Highlighted Details

  • Won EMNLP 2023 Outstanding Paper Award.
  • Supports multiple LLMs via LiteLLM (Azure, Claude, Cohere, Llama2).
  • Introduces NovelEval, a test set designed to avoid LLM contamination.
  • Demonstrates state-of-the-art ranking performance with open-source LLMs via Instruction Distillation.
  • Provides code for training specialized ranking models and evaluating them on benchmarks like TREC, BEIR, and Mr. TyDi.

Maintenance & Community

The project is associated with Weiwei Sun and other researchers from Renmin University of China. The primary development appears active, with recent updates in late 2023. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes related to the published papers. Commercial use would require clarification on licensing terms.

Limitations & Caveats

The core re-ranking functionality relies on external LLM APIs, incurring costs and potential latency. While Instruction Distillation aims to mitigate this, the initial LLM interaction is a dependency. The project is research-oriented, and production-readiness or long-term maintenance guarantees are not specified.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Starred by Jason Liu Jason Liu(Author of Instructor) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

Search-R1 by PeterGriffinJin

1.3%
3k
RL framework for training LLMs to use search engines
created 5 months ago
updated 3 weeks ago
Feedback? Help us improve.