ethics  by hendrycks

ICLR 2021 research paper on aligning AI with human values

Created 5 years ago
297 stars

Top 89.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the ETHICS benchmark dataset and fine-tuning scripts for evaluating AI alignment with human values across five ethical frameworks: Justice, Deontology, Virtue Ethics, Utilitarianism, and Commonsense. It targets AI researchers and developers seeking to measure and improve the ethical reasoning capabilities of their models.

How It Works

The project offers a benchmark dataset designed to test AI models on various ethical scenarios. It includes fine-tuning scripts for popular transformer models (e.g., BERT, RoBERTa, ALBERT) to adapt them to the benchmark tasks. The core approach involves evaluating model performance on specific ethical dimensions, enabling comparative analysis and identification of areas for improvement in AI ethical alignment.

Quick Start & Requirements

Highlighted Details

  • Comprehensive benchmark covering five distinct ethical frameworks.
  • Leaderboard for tracking model performance on the ETHICS dataset.
  • Fine-tuning scripts for popular transformer architectures.
  • Interactive scripts to probe commonsense and utilitarianism models.
  • Benchmarks show ALBERT-xxlarge achieving 71.0% average on the test set.

Maintenance & Community

The project is associated with ICLR 2021 and its authors are prominent researchers in AI safety and ethics. There is no explicit mention of ongoing maintenance or community channels like Discord/Slack.

Licensing & Compatibility

The repository does not explicitly state a license. The dataset is available for research purposes.

Limitations & Caveats

The project does not specify a license, which may impact commercial use or integration into closed-source projects. Ongoing maintenance and community support are not detailed.

Health Check
Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Pietro Schirano Pietro Schirano(Founder of MagicPath), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

CL4R1T4S by elder-plinius

2.4%
10k
Dataset of system prompts for major AI models + agents
Created 6 months ago
Updated 3 days ago
Feedback? Help us improve.