mauve  by krishnap25

PyTorch library for evaluating neural text generation quality

created 4 years ago
294 stars

Top 90.9% on sourcepulse

GitHubView on GitHub
Project Summary

MAUVE is a Python package designed to quantify the distributional divergence between generated text and human text, addressing the need for robust evaluation metrics in natural language generation. It is particularly useful for researchers and practitioners in NLP who need to assess the quality and similarity of text produced by language models compared to human-written text. The primary benefit is its ability to capture nuanced differences that simpler metrics might miss, offering a more comprehensive understanding of model performance.

How It Works

MAUVE computes similarity by leveraging Kullback–Leibler (KL) divergences within a quantized embedding space derived from a large language model (LLM), typically GPT-2. It quantizes text representations using k-means clustering, with adaptive hyperparameter selection for this process. The approach allows for flexibility by accepting raw text, pre-computed features (e.g., LLM hidden states), or tokenized inputs, making it adaptable to various workflows and even other modalities like images or audio.

Quick Start & Requirements

  • Install via pip: pip install mauve-text
  • For featurization: torch>=1.1.0 and transformers>=3.2.0 are required.
  • For full functionality and examples, cloning the repository is recommended.
  • Official Documentation: https://huggingface.co/spaces/mauve/mauve

Highlighted Details

  • Computes MAUVE score, frontier integral, and their smoothed variants (MAUVE*, Frontier Integral*).
  • Supports using pre-computed features or tokenized inputs, bypassing the need for PyTorch/Transformers.
  • Offers flexibility in featurization models (GPT-2 variants) and quantization parameters.
  • Can be extended to other modalities by providing pre-computed feature embeddings.

Maintenance & Community

The project is actively maintained, with contributions encouraged via GitHub issues and pull requests. The primary contact method is through GitHub issues.

Licensing & Compatibility

The project does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

MAUVE is best suited for relative comparisons; absolute scores can vary with hyperparameters. The metric requires a substantial number of samples (thousands recommended) for reliable results, as fewer samples can lead to optimistic and high-variance scores. The runtime can be significant, especially with a large number of clusters, and can be mitigated by adjusting clustering hyperparameters at the cost of accuracy.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 5 days ago
Feedback? Help us improve.