Rankify  by DataScienceUIBK

Python toolkit for retrieval, re-ranking, and RAG research

created 5 months ago
488 stars

Top 64.0% on sourcepulse

GitHubView on GitHub
Project Summary

Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. It offers a modular and extensible framework for researchers and practitioners to experiment with and benchmark various components of information retrieval pipelines, supporting over 40 benchmark datasets and a wide array of state-of-the-art models.

How It Works

Rankify provides a unified interface for three core stages of information retrieval: retrieval, re-ranking, and generation. It supports multiple retrieval techniques (e.g., BM25, DPR, ColBERT, BGE), over 24 re-ranking models, and integrates with generative models for RAG. The toolkit is built with modularity in mind, allowing users to easily swap components and benchmark different pipeline configurations.

Quick Start & Requirements

  • Installation: pip install rankify or pip install "rankify[all]" for full functionality. Installation from source is also supported.
  • Prerequisites: Python 3.10+, PyTorch 2.5.1 (CUDA 12.4/12.6 recommended for GPU use). Specific components like ColBERT require additional setup (GCC, environment variables).
  • Resources: Pre-retrieved datasets are available on Hugging Face.
  • Demo: streamlit run demo.py after installing streamlit.
  • Documentation: Rankify Docs

Highlighted Details

  • Supports 7 retrieval techniques, 24+ state-of-the-art re-ranking models, and multiple RAG methods.
  • Includes 40+ pre-retrieved benchmark datasets, eliminating the need for manual indexing for many common tasks.
  • Offers built-in evaluation metrics for retrieval, re-ranking, and RAG performance.
  • Provides prebuilt indices for Wikipedia and MS MARCO corpora.

Maintenance & Community

  • The project is under active development (v0.1.0 released).
  • Community contributions are encouraged via pull requests.
  • Chinese community resources and blog posts are available.

Licensing & Compatibility

  • Licensed under the Apache-2.0 License.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

  • The project is in its early stages (v0.1.0), with many planned improvements and features still under development.
  • Some datasets and retrieval methods are marked as "Part Completed" or "Pending."
  • RAG integration is currently focused on specific models like Llama, T5, and FiD.
Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
54 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.