Research paper for comprehensive GPT model trustworthiness assessment
Top 90.0% on sourcepulse
DecodingTrust provides a comprehensive framework for assessing the trustworthiness of Large Language Models (LLMs), focusing on eight key areas: toxicity, bias, adversarial robustness, out-of-distribution robustness, privacy, robustness to adversarial demonstrations, machine ethics, and fairness. It is designed for researchers and practitioners aiming to understand and mitigate risks associated with LLM deployment.
How It Works
The project is structured into eight distinct subdirectories, each dedicated to a specific trustworthiness dimension. Within these subdirectories, researchers will find curated datasets, evaluation scripts, and detailed READMEs. The framework supports evaluating various LLMs, including OpenAI's GPT models and open-source alternatives hosted on Hugging Face or locally.
Quick Start & Requirements
pip install -e .
). For PyTorch with specific CUDA versions, use Conda to create an environment first.spacy
, scipy
, fairlearn
, scikit-learn
, pandas
, pyarrow
. Docker and Singularity images are available.Highlighted Details
Maintenance & Community
The project is associated with researchers from multiple institutions, including the University of Illinois Urbana-Champaign. Contact is available via GitHub issues/pull requests or email to boxinw2@illinois.edu.
Licensing & Compatibility
Limitations & Caveats
The primary focus is on specific OpenAI model versions for benchmark consistency, though other models are supported. The project contains model outputs that may be considered offensive.
10 months ago
1 week