shield  by pegasi-ai

AI testing framework for LLM output validation

created 2 years ago
338 stars

Top 82.6% on sourcepulse

GitHubView on GitHub
Project Summary

Feather is a lightweight framework for statistical testing and validation of LLM outputs and behaviors, designed for AI developers and researchers. It enables the creation of comprehensive test suites, automated evaluations, and behavioral checks to ensure AI application reliability and adherence to requirements.

How It Works

Feather focuses on statistical testing, evaluations with quantitative and qualitative metrics, and simple safety validations. This approach allows for robust assessment of model behavior and output quality, ensuring consistency and correctness in AI applications.

Quick Start & Requirements

  • Install via pip install pegasi-ai.
  • Requires an API key from app.pegasi.ai.
  • See the Evals notebook for a quick start.

Highlighted Details

  • Provides a comprehensive testing suite for model behavior validation.
  • Supports quantitative and qualitative metrics for performance measurement.
  • Includes simple safety checks and output validation capabilities.

Maintenance & Community

The project has established AI validators and out-of-the-box judges. Future plans include distribution-based testing, expanded statistical validation tools, improved test result visualization, custom test case creation, and community-driven test suites.

Licensing & Compatibility

The license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The framework is currently under active development, with features like distribution-based testing, advanced statistical validation, and custom test case creation still on the roadmap.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
7
Star History
25 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.