Evaluation framework for generated text (research paper)
Top 99.5% on sourcepulse
GPTScore is a framework for evaluating generated text using pre-trained language models (PLMs) as evaluators. It allows for customizable, multifaceted, and training-free assessments of text quality, making it suitable for researchers and developers working with generative AI models.
How It Works
GPTScore leverages the emergent instruction-following capabilities of large PLMs to score generated text based on specified criteria. It supports evaluations with or without custom instructions and demonstrations, offering flexibility in defining evaluation aspects like "quality." The framework supports a wide range of PLMs, from smaller models like FLAN-T5-Small to large ones like GPT-3 (175B parameters).
Quick Start & Requirements
score_d2t.py
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The framework's effectiveness is dependent on the capabilities of the chosen PLM evaluator. The README does not detail specific performance benchmarks or potential biases introduced by the evaluator models. Accessing and running the largest supported models (e.g., GPT-3 175B) will require significant computational resources or API access.
2 years ago
1 week