awesome-pretrained-models-for-information-retrieval by ict-bigdatalab

Collection of papers on pre-trained models for information retrieval

Created 5 years ago

677 stars

Top 50.0% on SourcePulse

Project Summary

This repository is a curated list of academic papers focusing on pre-trained models for Information Retrieval (IR). It serves researchers and practitioners in the IR field by organizing key publications across various sub-topics, including sparse, dense, and hybrid retrieval, re-ranking, and the integration of Large Language Models (LLMs) with IR. The primary benefit is providing a structured overview of the rapidly evolving landscape of pre-trained models in IR.

How It Works

The list categorizes papers based on their contribution to IR, such as specific retrieval techniques (e.g., sparse representation learning, hard negative sampling for dense retrieval), architectural innovations (e.g., multi-vector representations, long document processing), and emerging trends like LLM-augmented retrieval. It provides links to papers and, where available, associated code repositories, enabling users to quickly access and explore relevant research.

Quick Start & Requirements

This is a curated list of papers and does not have a direct installation or execution command. Users will need to access the linked papers and code repositories independently.

Highlighted Details

Comprehensive coverage of pre-training methods for both first-stage retrieval and re-ranking.
Dedicated sections for emerging areas like LLM-IR integration, multimodal retrieval, and efficiency improvements.
Includes links to numerous code repositories for practical implementation of discussed techniques.
Organizes papers by specific sub-tasks and methodologies within IR.

Maintenance & Community

The repository is maintained by ict-bigdatalab and welcomes contributions via Pull Requests. Feedback and suggestions are encouraged.

Licensing & Compatibility

The repository itself is a list of links and does not impose a license on the content it references. Individual papers and code repositories will have their own respective licenses.

Limitations & Caveats

As a curated list, its completeness is dependent on community contributions. The rapid pace of research means new papers may not be immediately included. Users must consult individual paper licenses for usage restrictions.

awesome-pretrained-models-for-information-retrieval by ict-bigdatalab

Explore Similar Projects

Semantic-Retrieval-Models by caiyinqiong

dpr-scale by facebookresearch

denser-retriever by denser-org

LLM4IR-Survey by RUC-NLPIR

stark by snap-stanford

ANCE by microsoft

RAG-Interview-Questions-and-Answers-Hub by KalyanKS-NLP

atlas by facebookresearch

pyterrier by terrier-org

ai-powered-search by treygrainger

Local_Pdf_Chat_RAG by weiwill88

pyserini by castorini