DecodingTrust  by AI-secure

Research paper for comprehensive GPT model trustworthiness assessment

Created 2 years ago
302 stars

Top 88.4% on SourcePulse

GitHubView on GitHub
Project Summary

DecodingTrust provides a comprehensive framework for assessing the trustworthiness of Large Language Models (LLMs), focusing on eight key areas: toxicity, bias, adversarial robustness, out-of-distribution robustness, privacy, robustness to adversarial demonstrations, machine ethics, and fairness. It is designed for researchers and practitioners aiming to understand and mitigate risks associated with LLM deployment.

How It Works

The project is structured into eight distinct subdirectories, each dedicated to a specific trustworthiness dimension. Within these subdirectories, researchers will find curated datasets, evaluation scripts, and detailed READMEs. The framework supports evaluating various LLMs, including OpenAI's GPT models and open-source alternatives hosted on Hugging Face or locally.

Quick Start & Requirements

  • Installation: Clone the repository and install in editable mode (pip install -e .). For PyTorch with specific CUDA versions, use Conda to create an environment first.
  • Prerequisites: Python 3.9+, PyTorch (CUDA 12.1 recommended), spacy, scipy, fairlearn, scikit-learn, pandas, pyarrow. Docker and Singularity images are available.
  • Resources: Requires significant disk space for datasets and computational resources for model evaluation.
  • Documentation: Tutorial.md

Highlighted Details

  • Comprehensive evaluation across eight trustworthiness dimensions.
  • Supports both OpenAI proprietary models (GPT-3.5-turbo-0301, GPT-4-0314) and Hugging Face/local LLMs (e.g., Llama-v2, Vicuna).
  • Includes generated datasets and scripts for reproducible research.
  • Offers Docker and Singularity images for easier deployment, including support for ppc64le architecture.

Maintenance & Community

The project is associated with researchers from multiple institutions, including the University of Illinois Urbana-Champaign. Contact is available via GitHub issues/pull requests or email to boxinw2@illinois.edu.

Licensing & Compatibility

  • License: CC BY-SA 4.0.
  • Compatibility: Permissive for research and commercial use, but requires attribution and share-alike if modified.

Limitations & Caveats

The primary focus is on specific OpenAI model versions for benchmark consistency, though other models are supported. The project contains model outputs that may be considered offensive.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luca Soldaini Luca Soldaini(Research Scientist at Ai2), and
7 more.

hh-rlhf by anthropics

0.2%
2k
RLHF dataset for training safe AI assistants
Created 3 years ago
Updated 3 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
3 more.

promptbench by microsoft

0.1%
3k
LLM evaluation framework
Created 2 years ago
Updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Addair Travis Addair(Cofounder of Predibase), and
4 more.

alibi by SeldonIO

0.1%
3k
Python library for ML model inspection and interpretation
Created 6 years ago
Updated 18 hours ago
Feedback? Help us improve.