RankGPT by sunnweiwei

Re-ranking agent using LLMs for information retrieval research

Created 2 years ago

648 stars

Top 51.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jerry Liu

Cofounder of LlamaIndex

Project Summary

This project provides code and methodologies for using Large Language Models (LLMs) as re-ranking agents in Information Retrieval (IR). It targets researchers and practitioners in NLP and IR seeking to leverage LLMs for improved search relevance, offering a novel approach to adapt LLMs for this task and distill their capabilities into smaller, specialized models.

How It Works

The core approach involves using LLMs to generate permutations of search results based on a given query. This is achieved by crafting specific instructions for the LLM, prompting it to rank a set of candidate documents. A "sliding window" strategy is employed to overcome LLM token limits, allowing for the re-ranking of more documents by processing them in overlapping chunks. The project also details a method for "Instruction Distillation," where LLM-generated rankings are used to train smaller, specialized neural models (e.g., DeBERTa) for efficient zero-shot ranking.

Quick Start & Requirements

Install: pip install rank_gpt
Prerequisites: Python 3.8+, OpenAI API Key (for GPT models), pyserini for retrieval benchmarks. CUDA is recommended for training specialized models.
Setup: Basic usage requires an API key and minimal Python setup. Benchmark evaluation involves downloading datasets and pre-built indices.
Links:
- Paper: https://arxiv.org/pdf/2304.09542.pdf
- Instruction Distillation Paper: https://arxiv.org/abs/2311.01555

Highlighted Details

Won EMNLP 2023 Outstanding Paper Award.
Supports multiple LLMs via LiteLLM (Azure, Claude, Cohere, Llama2).
Introduces NovelEval, a test set designed to avoid LLM contamination.
Demonstrates state-of-the-art ranking performance with open-source LLMs via Instruction Distillation.
Provides code for training specialized ranking models and evaluating them on benchmarks like TREC, BEIR, and Mr. TyDi.

Maintenance & Community

The project is associated with Weiwei Sun and other researchers from Renmin University of China. The primary development appears active, with recent updates in late 2023. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes related to the published papers. Commercial use would require clarification on licensing terms.

Limitations & Caveats

The core re-ranking functionality relies on external LLM APIs, incurring costs and potential latency. While Instruction Distillation aims to mitigate this, the initial LLM interaction is a dependency. The project is research-oriented, and production-readiness or long-term maintenance guarantees are not specified.

RankGPT by sunnweiwei

Explore Similar Projects

Github-Ranking-AI by yuxiaopeng

LLMRank by RUCAIBox

LLM4IR-Survey by RUC-NLPIR

R1-Searcher by RUCAIBox

FlashRank by PrithivirajDamodaran

rank_llm by castorini

auto-evaluator by rlancemartin

rerankers by AnswerDotAI

TrustRAG by gomate-community

Search-R1 by PeterGriffinJin

rag-from-scratch by langchain-ai

llm-council by karpathy