pyversity  by Pringled

Fast library for retrieval result diversification

Created 3 weeks ago

New!

380 stars

Top 74.8% on SourcePulse

GitHubView on GitHub
Project Summary

Fast, lightweight retrieval diversification library addressing redundancy in search results. Pyversity offers a unified API for popular strategies like MMR, MSD, DPP, and Cover, enabling users to surface relevant yet less redundant items. It targets applications from e-commerce and news search to RAG/LLM contexts, improving user experience and exploration by balancing relevance with variety.

How It Works

Pyversity re-ranks retrieval results by implementing several diversification strategies. These algorithms select items not only based on their relevance to a query but also on their novelty relative to already selected items. This approach encourages a broader coverage of topics or styles within the results. The library's core advantage lies in its efficiency, lightweight dependency on NumPy, and a clear, unified interface for these distinct diversification techniques.

Quick Start & Requirements

  • Install: pip install pyversity
  • Prerequisites: NumPy.
  • Resource Footprint: Examples run in milliseconds.
  • Documentation: Quickstart and Supported Strategies sections are available within the README.

Highlighted Details

  • Supported Strategies: Implements Maximal Marginal Relevance (MMR), Max Sum of Distances (MSD), Determinantal Point Processes (DPP), and COVER (Facility-Location).
  • MMR/MSD: Offer O(k·n·d) time complexity, suitable for general-purpose diversification and avoiding near-duplicates (MMR) or achieving stronger spread (MSD).
  • DPP: Provides O(k·n·d + n·k²) complexity for built-in diversity and redundancy elimination.
  • COVER: Has O(k·n²) complexity, ideal for topic coverage and clustering but potentially slower for large n.
  • Parameterization: A diversity parameter (0.0 to 1.0) allows tuning the trade-off between relevance and diversity.

Maintenance & Community

  • Author: Thomas van Dongen.
  • No community channels (e.g., Discord, Slack) or project sponsorships are detailed in the provided information.

Licensing & Compatibility

  • License: No license information is specified in the provided README.
  • Compatibility: The lack of a specified license may impact commercial use or integration into closed-source projects.

Limitations & Caveats

  • The library does not specify its license, posing a potential adoption blocker.
  • The COVER strategy's O(k·n²) time complexity may limit its applicability for extremely large datasets.
  • While supporting several key strategies, the library does not claim exhaustive coverage of all possible diversification methods.
Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
15
Issues (30d)
2
Star History
380 stars in the last 27 days

Explore Similar Projects

Feedback? Help us improve.