mauve by krishnap25

PyTorch library for evaluating neural text generation quality

Created 4 years ago

307 stars

Top 87.5% on SourcePulse

Project Summary

MAUVE is a Python package designed to quantify the distributional divergence between generated text and human text, addressing the need for robust evaluation metrics in natural language generation. It is particularly useful for researchers and practitioners in NLP who need to assess the quality and similarity of text produced by language models compared to human-written text. The primary benefit is its ability to capture nuanced differences that simpler metrics might miss, offering a more comprehensive understanding of model performance.

How It Works

MAUVE computes similarity by leveraging Kullback–Leibler (KL) divergences within a quantized embedding space derived from a large language model (LLM), typically GPT-2. It quantizes text representations using k-means clustering, with adaptive hyperparameter selection for this process. The approach allows for flexibility by accepting raw text, pre-computed features (e.g., LLM hidden states), or tokenized inputs, making it adaptable to various workflows and even other modalities like images or audio.

Quick Start & Requirements

Install via pip: pip install mauve-text
For featurization: torch>=1.1.0 and transformers>=3.2.0 are required.
For full functionality and examples, cloning the repository is recommended.
Official Documentation: https://huggingface.co/spaces/mauve/mauve

Highlighted Details

Computes MAUVE score, frontier integral, and their smoothed variants (MAUVE*, Frontier Integral*).
Supports using pre-computed features or tokenized inputs, bypassing the need for PyTorch/Transformers.
Offers flexibility in featurization models (GPT-2 variants) and quantization parameters.
Can be extended to other modalities by providing pre-computed feature embeddings.

Maintenance & Community

The project is actively maintained, with contributions encouraged via GitHub issues and pull requests. The primary contact method is through GitHub issues.

Licensing & Compatibility

The project does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

MAUVE is best suited for relative comparisons; absolute scores can vary with hyperparameters. The metric requires a substantial number of samples (thousands recommended) for reliable results, as fewer samples can lead to optimistic and high-variance scores. The runtime can be significant, especially with a large number of clusters, and can be mitigated by adjusting clustering hyperparameters at the cost of accuracy.

mauve by krishnap25

Explore Similar Projects

Luotuo-Text-Embedding by LC1332

bocoel by rentruewang

awesome-metric-learning by qdrant

WordLlama by dleemiller

pylate by lightonai

contriever by facebookresearch

calm by shaochenze

fastembed-rs by Anush008

mildlyoverfitted by jankrepl

awesome-sentence-embedding by Separius

text2vec by shibing624

text_classification by brightmart