mauve  by krishnap25

PyTorch library for evaluating neural text generation quality

Created 4 years ago
299 stars

Top 88.8% on SourcePulse

GitHubView on GitHub
Project Summary

MAUVE is a Python package designed to quantify the distributional divergence between generated text and human text, addressing the need for robust evaluation metrics in natural language generation. It is particularly useful for researchers and practitioners in NLP who need to assess the quality and similarity of text produced by language models compared to human-written text. The primary benefit is its ability to capture nuanced differences that simpler metrics might miss, offering a more comprehensive understanding of model performance.

How It Works

MAUVE computes similarity by leveraging Kullback–Leibler (KL) divergences within a quantized embedding space derived from a large language model (LLM), typically GPT-2. It quantizes text representations using k-means clustering, with adaptive hyperparameter selection for this process. The approach allows for flexibility by accepting raw text, pre-computed features (e.g., LLM hidden states), or tokenized inputs, making it adaptable to various workflows and even other modalities like images or audio.

Quick Start & Requirements

  • Install via pip: pip install mauve-text
  • For featurization: torch>=1.1.0 and transformers>=3.2.0 are required.
  • For full functionality and examples, cloning the repository is recommended.
  • Official Documentation: https://huggingface.co/spaces/mauve/mauve

Highlighted Details

  • Computes MAUVE score, frontier integral, and their smoothed variants (MAUVE*, Frontier Integral*).
  • Supports using pre-computed features or tokenized inputs, bypassing the need for PyTorch/Transformers.
  • Offers flexibility in featurization models (GPT-2 variants) and quantization parameters.
  • Can be extended to other modalities by providing pre-computed feature embeddings.

Maintenance & Community

The project is actively maintained, with contributions encouraged via GitHub issues and pull requests. The primary contact method is through GitHub issues.

Licensing & Compatibility

The project does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

MAUVE is best suited for relative comparisons; absolute scores can vary with hyperparameters. The metric requires a substantial number of samples (thousands recommended) for reliable results, as fewer samples can lead to optimistic and high-variance scores. The runtime can be significant, especially with a large number of clusters, and can be mitigated by adjusting clustering hyperparameters at the cost of accuracy.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.