moonshot  by aiverify-foundation

LLM app evaluation and red-teaming tool

created 1 year ago
261 stars

Top 98.0% on sourcepulse

GitHubView on GitHub
Project Summary

Moonshot is a modular tool designed for evaluating and red-teaming LLM applications, targeting AI developers, compliance teams, and system owners. It aims to simplify the complex processes of benchmarking LLM performance and identifying vulnerabilities through adversarial testing, offering a unified platform for comprehensive AI system assessment.

How It Works

Moonshot integrates benchmarking and red-teaming capabilities, allowing users to test LLMs against predefined competency metrics and probe for vulnerabilities using adversarial prompts. It supports various interfaces, including a web UI, CLI, and library APIs for MLOps integration. The tool utilizes "recipes" – collections of datasets and metrics – which can be curated or customized, and supports prompt templates and optional grading scales for standardized evaluation.

Quick Start & Requirements

  • Installation: pip install "aiverify-moonshot[all]" and python -m moonshot -i moonshot-data -i moonshot-ui.
  • Prerequisites: Python 3.11, Git, virtual environment recommended. For Web UI: Node.js 20.11.1 LTS+. Requires test assets from moonshot-data and moonshot-ui.
  • Running: Web UI: python -m moonshot web. CLI: python -m moonshot cli interactive.
  • Documentation: Installation Guide

Highlighted Details

  • Supports integration with popular LLM providers (OpenAI, Anthropic, HuggingFace) via API keys, with custom connector options.
  • Includes a range of benchmarks covering capability, quality, and trust & safety, incorporating community standards like BigBench and MLCommons safety benchmarks.
  • Features "cookbooks" for selecting relevant tests and allows custom recipe creation for unique use cases.
  • Offers automated red-teaming capabilities using research-backed attack modules to scale adversarial testing.

Maintenance & Community

Developed by the AI Verify Foundation. Links to user guides and documentation are provided.

Licensing & Compatibility

Licensed under the Apache Software License 2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

The project is at version 0.6.2, indicating potential for ongoing development and changes. While Python 3.11 is specified, compatibility with later releases is not guaranteed. The tool requires external data and UI components for full functionality.

Health Check
Last commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)
5
Issues (30d)
4
Star History
33 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.3%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 1 week ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
created 5 years ago
updated 3 weeks ago
Feedback? Help us improve.