ethics by hendrycks

ICLR 2021 research paper on aligning AI with human values

Created 6 years ago

324 stars

Top 83.6% on SourcePulse

Project Summary

This repository provides the ETHICS benchmark dataset and fine-tuning scripts for evaluating AI alignment with human values across five ethical frameworks: Justice, Deontology, Virtue Ethics, Utilitarianism, and Commonsense. It targets AI researchers and developers seeking to measure and improve the ethical reasoning capabilities of their models.

How It Works

The project offers a benchmark dataset designed to test AI models on various ethical scenarios. It includes fine-tuning scripts for popular transformer models (e.g., BERT, RoBERTa, ALBERT) to adapt them to the benchmark tasks. The core approach involves evaluating model performance on specific ethical dimensions, enabling comparative analysis and identification of areas for improvement in AI ethical alignment.

Quick Start & Requirements

Install: pip install -r requirements.txt (specific installation commands for fine-tuning scripts are within subfolders).
Prerequisites: Python 3.x, PyTorch, Hugging Face Transformers library. GPU recommended for fine-tuning.
Dataset: Available at https://github.com/hendrycks/ethics/blob/main/ethics/data/ethics.json
Model Weights: Available at https://github.com/hendrycks/ethics/blob/main/models.md

Highlighted Details

Comprehensive benchmark covering five distinct ethical frameworks.
Leaderboard for tracking model performance on the ETHICS dataset.
Fine-tuning scripts for popular transformer architectures.
Interactive scripts to probe commonsense and utilitarianism models.
Benchmarks show ALBERT-xxlarge achieving 71.0% average on the test set.

Maintenance & Community

The project is associated with ICLR 2021 and its authors are prominent researchers in AI safety and ethics. There is no explicit mention of ongoing maintenance or community channels like Discord/Slack.

Licensing & Compatibility

The repository does not explicitly state a license. The dataset is available for research purposes.

Limitations & Caveats

The project does not specify a license, which may impact commercial use or integration into closed-source projects. Ongoing maintenance and community support are not detailed.

ethics by hendrycks

Explore Similar Projects

criticalML by rockita

loophole by brendanhogan

lm-council by machine-theory

dolphin-system-messages by QuixiAI

evals by anthropics

llm-council-skill by tenfoldmarc

test by hendrycks

xai_resources by pbiecek

www-project-ai-testing-guide by OWASP

awesome-artificial-intelligence-regulation by EthicalML

awful-ai by daviddao

CL4R1T4S by elder-plinius