DecodingTrust by AI-secure

Research paper for comprehensive GPT model trustworthiness assessment

Created 2 years ago

311 stars

Top 86.8% on SourcePulse

Project Summary

DecodingTrust provides a comprehensive framework for assessing the trustworthiness of Large Language Models (LLMs), focusing on eight key areas: toxicity, bias, adversarial robustness, out-of-distribution robustness, privacy, robustness to adversarial demonstrations, machine ethics, and fairness. It is designed for researchers and practitioners aiming to understand and mitigate risks associated with LLM deployment.

How It Works

The project is structured into eight distinct subdirectories, each dedicated to a specific trustworthiness dimension. Within these subdirectories, researchers will find curated datasets, evaluation scripts, and detailed READMEs. The framework supports evaluating various LLMs, including OpenAI's GPT models and open-source alternatives hosted on Hugging Face or locally.

Quick Start & Requirements

Installation: Clone the repository and install in editable mode (pip install -e .). For PyTorch with specific CUDA versions, use Conda to create an environment first.
Prerequisites: Python 3.9+, PyTorch (CUDA 12.1 recommended), spacy, scipy, fairlearn, scikit-learn, pandas, pyarrow. Docker and Singularity images are available.
Resources: Requires significant disk space for datasets and computational resources for model evaluation.
Documentation: Tutorial.md

Highlighted Details

Comprehensive evaluation across eight trustworthiness dimensions.
Supports both OpenAI proprietary models (GPT-3.5-turbo-0301, GPT-4-0314) and Hugging Face/local LLMs (e.g., Llama-v2, Vicuna).
Includes generated datasets and scripts for reproducible research.
Offers Docker and Singularity images for easier deployment, including support for ppc64le architecture.

Maintenance & Community

The project is associated with researchers from multiple institutions, including the University of Illinois Urbana-Champaign. Contact is available via GitHub issues/pull requests or email to boxinw2@illinois.edu.

Licensing & Compatibility

License: CC BY-SA 4.0.
Compatibility: Permissive for research and commercial use, but requires attribution and share-alike if modified.

Limitations & Caveats

The primary focus is on specific OpenAI model versions for benchmark consistency, though other models are supported. The project contains model outputs that may be considered offensive.

DecodingTrust by AI-secure

Explore Similar Projects

GPT-Fathom by GPT-Fathom

openpcc by openpcc

Awesome-ML-SP-Papers by gnipping

awesome-trustworthy-deep-learning by MinghuiChen43

moonshot by aiverify-foundation

TrustLLM by HowieHwong

Awesome-LM-SSP by CryptoAILab

ml_privacy_meter by privacytrustlab

hh-rlhf by anthropics

deepteam by confident-ai

alibi by SeldonIO

giskard-oss by Giskard-AI