TrustLLM by HowieHwong

Trustworthiness benchmark for large language models (ICML 2024)

Created 2 years ago

618 stars

Top 53.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

TrustLLM is a comprehensive framework for evaluating the trustworthiness of Large Language Models (LLMs), targeting researchers and developers. It provides a standardized benchmark, evaluation toolkit, and dataset covering eight dimensions of trustworthiness, enabling systematic assessment and comparison of LLM performance.

How It Works

TrustLLM establishes a benchmark across six key trustworthiness dimensions: truthfulness, safety, fairness, robustness, privacy, and machine ethics. It utilizes a curated collection of over 30 datasets, many of which are introduced in this work, and employs a mix of automatic and human-in-the-loop evaluation methods. The toolkit facilitates easy integration and evaluation of various LLMs, including those accessible via APIs like Azure OpenAI, Replicate, and DeepInfra.

Quick Start & Requirements

Installation: Recommended via GitHub clone (git clone git@github.com:HowieHwong/TrustLLM.git) followed by pip install . from the trustllm_pkg directory. Pip installation (pip install trustllm) is deprecated.
Prerequisites: Python 3.9 is recommended. GPU acceleration is implied for efficient LLM evaluation.
Resources: The TrustLLM dataset is available on Hugging Face. Links to the project website, paper, dataset, data map, and leaderboard are provided.

Highlighted Details

Supports evaluation of 16 mainstream LLMs, including recent models like Llama3 and Mixtral.
Integrates with UniGen for dynamic evaluation and offers support for models via Replicate, DeepInfra, and Azure OpenAI API.
Comprehensive dataset covers misinformation, hallucination, sycophancy, stereotype, disparagement, misuse, and privacy awareness.
Includes a public leaderboard for tracking LLM trustworthiness performance.

Maintenance & Community

The project is associated with ICML 2024 and shows active development with recent updates (v0.3.0 in April 2024) adding new models and features. Contributions are welcomed via pull requests.

Licensing & Compatibility

The code is released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README mentions ongoing work for "Chinese output evaluation" and "Downstream application evaluation," suggesting these areas may be less mature or incomplete. Some datasets are marked as "first proposed in our benchmark," implying potential for ongoing refinement.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days