awesome-pretrained-models-for-information-retrieval  by ict-bigdatalab

Collection of papers on pre-trained models for information retrieval

created 4 years ago
669 stars

Top 51.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated list of academic papers focusing on pre-trained models for Information Retrieval (IR). It serves researchers and practitioners in the IR field by organizing key publications across various sub-topics, including sparse, dense, and hybrid retrieval, re-ranking, and the integration of Large Language Models (LLMs) with IR. The primary benefit is providing a structured overview of the rapidly evolving landscape of pre-trained models in IR.

How It Works

The list categorizes papers based on their contribution to IR, such as specific retrieval techniques (e.g., sparse representation learning, hard negative sampling for dense retrieval), architectural innovations (e.g., multi-vector representations, long document processing), and emerging trends like LLM-augmented retrieval. It provides links to papers and, where available, associated code repositories, enabling users to quickly access and explore relevant research.

Quick Start & Requirements

This is a curated list of papers and does not have a direct installation or execution command. Users will need to access the linked papers and code repositories independently.

Highlighted Details

  • Comprehensive coverage of pre-training methods for both first-stage retrieval and re-ranking.
  • Dedicated sections for emerging areas like LLM-IR integration, multimodal retrieval, and efficiency improvements.
  • Includes links to numerous code repositories for practical implementation of discussed techniques.
  • Organizes papers by specific sub-tasks and methodologies within IR.

Maintenance & Community

The repository is maintained by ict-bigdatalab and welcomes contributions via Pull Requests. Feedback and suggestions are encouraged.

Licensing & Compatibility

The repository itself is a list of links and does not impose a license on the content it references. Individual papers and code repositories will have their own respective licenses.

Limitations & Caveats

As a curated list, its completeness is dependent on community contributions. The rapid pace of research means new papers may not be immediately included. Users must consult individual paper licenses for usage restrictions.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.