Discover and explore top open-source AI tools and projects—updated daily.
awsEvaluate foundation models for various NLP tasks
Top 94.1% on SourcePulse
fmeval is an open-source Python library designed for evaluating Large Language Models (LLMs) across various tasks like open-ended generation, summarization, question answering, and classification. It provides algorithms to assess LLMs for accuracy, toxicity, semantic robustness, and prompt stereotyping, enabling users to select the best LLM for their specific use cases.
How It Works
fmeval employs a modular approach using Transform and TransformPipeline objects. Transform encapsulates record-level data manipulation logic, allowing users to create custom evaluation metrics. TransformPipeline chains these Transform objects to define a sequence of operations, including prompt generation, model invocation via ModelRunner, and metric computation. This design facilitates extensibility and the creation of custom evaluation workflows.
Quick Start & Requirements
pip install fmevalModelRunner implementations are supported.Highlighted Details
DataConfig.ModelRunner implementations for AWS services.Maintenance & Community
CONTRIBUTING.Licensing & Compatibility
Limitations & Caveats
OSError: [Errno 0] AssignProcessToJobObject() due to Ray integration; installing Python from the official website is recommended.PARALLELIZATION_FACTOR can be adjusted.5 months ago
1 week
huggingface
EleutherAI