Rankify by DataScienceUIBK

Python toolkit for retrieval, re-ranking, and RAG research

Created 11 months ago

578 stars

Top 55.9% on SourcePulse

1 Expert Loves This Project

okhat

Coauthor of DSPy, ColBERT; Professor at MIT

Project Summary

Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. It offers a modular and extensible framework for researchers and practitioners to experiment with and benchmark various components of information retrieval pipelines, supporting over 40 benchmark datasets and a wide array of state-of-the-art models.

How It Works

Rankify provides a unified interface for three core stages of information retrieval: retrieval, re-ranking, and generation. It supports multiple retrieval techniques (e.g., BM25, DPR, ColBERT, BGE), over 24 re-ranking models, and integrates with generative models for RAG. The toolkit is built with modularity in mind, allowing users to easily swap components and benchmark different pipeline configurations.

Quick Start & Requirements

Installation: pip install rankify or pip install "rankify[all]" for full functionality. Installation from source is also supported.
Prerequisites: Python 3.10+, PyTorch 2.5.1 (CUDA 12.4/12.6 recommended for GPU use). Specific components like ColBERT require additional setup (GCC, environment variables).
Resources: Pre-retrieved datasets are available on Hugging Face.
Demo: streamlit run demo.py after installing streamlit.
Documentation: Rankify Docs

Highlighted Details

Supports 7 retrieval techniques, 24+ state-of-the-art re-ranking models, and multiple RAG methods.
Includes 40+ pre-retrieved benchmark datasets, eliminating the need for manual indexing for many common tasks.
Offers built-in evaluation metrics for retrieval, re-ranking, and RAG performance.
Provides prebuilt indices for Wikipedia and MS MARCO corpora.

Maintenance & Community

The project is under active development (v0.1.0 released).
Community contributions are encouraged via pull requests.
Chinese community resources and blog posts are available.

Licensing & Compatibility

Licensed under the Apache-2.0 License.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is in its early stages (v0.1.0), with many planned improvements and features still under development.
Some datasets and retrieval methods are marked as "Part Completed" or "Pending."
RAG integration is currently focused on specific models like Llama, T5, and FiD.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

52 stars in the last 30 days

Explore Similar Projects

embedding_rerank_retrieval by percent4

RAG evaluation for retrieval algorithms, using LlamaIndex

Created 2 years ago

Updated 5 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

stark by snap-stanford

LLM retrieval benchmark on textual/relational knowledge bases (NeurIPS 2024)

Created 1 year ago

Updated 1 week ago

Awesome-RAG by Danielskry

Awesome list of RAG resources

Created 1 year ago

Updated 2 months ago

Starred by

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2).

pyterrier by terrier-org

Python framework for information retrieval and RAG

Created 5 years ago

Updated 3 weeks ago

HiRAG by hhy-huang

Retrieval-Augmented Generation with Hierarchical Knowledge

Created 10 months ago

Updated 1 month ago

Starred by

Jeff Huber

Jeff Huber(Cofounder of Chroma) and

Casper Hansen

Casper Hansen(Author of AutoAWQ).

rank_llm by castorini

Python toolkit for reproducible information retrieval research

Created 2 years ago

Updated 2 weeks ago

Starred by

Li Jiang

Li Jiang(Coauthor of AutoGen; Engineer at Microsoft).

TrustRAG by gomate-community

RAG framework for reliable input, trusted output

Created 1 year ago

Updated 4 days ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory), and

1 more.

RAG-Survey by Tongji-KGLLM

RAG survey and knowledge base

Created 2 years ago

Updated 1 year ago

Local_Pdf_Chat_RAG by weiwill88

RAG system for local PDF Q&A, aiding RAG beginners

Created 11 months ago

Updated 2 months ago

FlashRAG by RUC-NLPIR

Python toolkit for efficient RAG research

Created 1 year ago

Updated 1 month ago

Starred by

Zack Li

Zack Li(Cofounder of Nexa AI),

Xiaofan Luan

Xiaofan Luan(VP Engineering at Zilliz), and

1 more.

pyserini by castorini

Python toolkit for reproducible information retrieval research

Created 6 years ago

Updated 5 days ago

Starred by

Paras Jain

Paras Jain(Cofounder of Genmo),

Malte Pietsch

Malte Pietsch(Cofounder of deepset), and

2 more.

anserini by castorini

Lucene toolkit for reproducible information retrieval research

Created 10 years ago

Updated 10 hours ago

Feedback? Help us improve.