DecodingTrust  by AI-secure

Research paper for comprehensive GPT model trustworthiness assessment

created 2 years ago
299 stars

Top 90.0% on sourcepulse

GitHubView on GitHub
Project Summary

DecodingTrust provides a comprehensive framework for assessing the trustworthiness of Large Language Models (LLMs), focusing on eight key areas: toxicity, bias, adversarial robustness, out-of-distribution robustness, privacy, robustness to adversarial demonstrations, machine ethics, and fairness. It is designed for researchers and practitioners aiming to understand and mitigate risks associated with LLM deployment.

How It Works

The project is structured into eight distinct subdirectories, each dedicated to a specific trustworthiness dimension. Within these subdirectories, researchers will find curated datasets, evaluation scripts, and detailed READMEs. The framework supports evaluating various LLMs, including OpenAI's GPT models and open-source alternatives hosted on Hugging Face or locally.

Quick Start & Requirements

  • Installation: Clone the repository and install in editable mode (pip install -e .). For PyTorch with specific CUDA versions, use Conda to create an environment first.
  • Prerequisites: Python 3.9+, PyTorch (CUDA 12.1 recommended), spacy, scipy, fairlearn, scikit-learn, pandas, pyarrow. Docker and Singularity images are available.
  • Resources: Requires significant disk space for datasets and computational resources for model evaluation.
  • Documentation: Tutorial.md

Highlighted Details

  • Comprehensive evaluation across eight trustworthiness dimensions.
  • Supports both OpenAI proprietary models (GPT-3.5-turbo-0301, GPT-4-0314) and Hugging Face/local LLMs (e.g., Llama-v2, Vicuna).
  • Includes generated datasets and scripts for reproducible research.
  • Offers Docker and Singularity images for easier deployment, including support for ppc64le architecture.

Maintenance & Community

The project is associated with researchers from multiple institutions, including the University of Illinois Urbana-Champaign. Contact is available via GitHub issues/pull requests or email to boxinw2@illinois.edu.

Licensing & Compatibility

  • License: CC BY-SA 4.0.
  • Compatibility: Permissive for research and commercial use, but requires attribution and share-alike if modified.

Limitations & Caveats

The primary focus is on specific OpenAI model versions for benchmark consistency, though other models are supported. The project contains model outputs that may be considered offensive.

Health Check
Last commit

10 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 21 hours ago
Feedback? Help us improve.