pyterrier  by terrier-org

Python framework for information retrieval and RAG

Created 5 years ago
486 stars

Top 63.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

PyTerrier is a Python framework designed for building and experimenting with information retrieval (IR) and Retrieval Augmented Generation (RAG) pipelines. It empowers engineers and researchers to construct complex search systems, integrate neural models, and rigorously evaluate their performance on standard datasets, streamlining the development lifecycle for advanced search applications.

How It Works

The framework supports building diverse indexing and retrieval pipelines, including sparse, learned sparse, and dense representations. It facilitates the integration of neural rerankers (e.g., MonoT5) and LLMs for RAG, enabling sophisticated query processing. PyTerrier's declarative pt.Experiment function allows for systematic comparison of pipeline effectiveness across standard IR datasets.

Quick Start & Requirements

Installation is straightforward via pip install 'pyterrier[all]'. Users may need to configure the JAVA_HOME environment variable. Colab notebooks are recommended for immediate use and easier setup. Official quick-start examples are available via Colab badges and a tutorial.

Highlighted Details

  • Supports building sparse, learned sparse, and dense indexing/retrieval pipelines.
  • Enables seamless integration of neural rerankers (MonoT5, DuoT5) and LLMs for RAG.
  • Facilitates declarative experimentation for comparing retrieval effectiveness using pt.Experiment.
  • Offers an extensive ecosystem of plugins for dense retrieval, RAG, and neural indexing (e.g., Pyterrier_DR, Pyterrier_RAG, PyTerrier_SPLADE).
  • Integrates directly with the ir_datasets package for easy access to numerous standard IR datasets.

Maintenance & Community

The project is actively developed by a team of researchers from various universities. While specific community channels like Discord/Slack are not detailed, a comprehensive tutorial is available for guidance.

Licensing & Compatibility

PyTerrier is distributed under the Mozilla Public License Version 2.0 (MPL 2.0). Users must adhere to a citation license, requiring acknowledgment of the project's foundational paper in any derivative work or material where PyTerrier was used for search or experimentation.

Limitations & Caveats

The provided README does not explicitly detail known limitations, unsupported platforms, or alpha/beta status.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
11
Issues (30d)
2
Star History
12 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.