Evaluate foundation models for various NLP tasks
Top 98.6% on sourcepulse
fmeval is an open-source Python library designed for evaluating Large Language Models (LLMs) across various tasks like open-ended generation, summarization, question answering, and classification. It provides algorithms to assess LLMs for accuracy, toxicity, semantic robustness, and prompt stereotyping, enabling users to select the best LLM for their specific use cases.
How It Works
fmeval employs a modular approach using Transform
and TransformPipeline
objects. Transform
encapsulates record-level data manipulation logic, allowing users to create custom evaluation metrics. TransformPipeline
chains these Transform
objects to define a sequence of operations, including prompt generation, model invocation via ModelRunner
, and metric computation. This design facilitates extensibility and the creation of custom evaluation workflows.
Quick Start & Requirements
pip install fmeval
ModelRunner
implementations are supported.Highlighted Details
DataConfig
.ModelRunner
implementations for AWS services.Maintenance & Community
CONTRIBUTING
.Licensing & Compatibility
Limitations & Caveats
OSError: [Errno 0] AssignProcessToJobObject()
due to Ray integration; installing Python from the official website is recommended.PARALLELIZATION_FACTOR
can be adjusted.2 weeks ago
1 week