evals  by anthropics

Model-written datasets for evaluating language model behaviors

Created 2 years ago
298 stars

Top 89.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides datasets generated by language models for evaluating AI behaviors, targeting researchers and developers interested in model-generated data quality and specific AI characteristics. The datasets enable evaluation of models for persona consistency, sycophancy, advanced AI risks, and gender bias, offering insights into model behavior beyond standard benchmarks.

How It Works

The datasets are designed to probe dialogue agents and other language models through carefully crafted prompts. They are generated using a few-shot approach, aiming to elicit specific behaviors related to persona, agreement, and AI safety concerns. The inclusion of human-written datasets allows for direct comparison and validation of model-generated evaluation quality.

Quick Start & Requirements

  • Datasets are available within the repository's directory structure (e.g., persona/, sycophancy/).
  • No specific installation commands are provided; data is intended for direct use or adaptation.
  • Requires a language model or framework capable of processing dialogue or text-based evaluations.
  • Further details on generation and validation are available in the associated paper: https://arxiv.org/abs/2212.09251.

Highlighted Details

  • Datasets cover persona, sycophancy, advanced AI risks, and gender bias.
  • Includes model-generated versions of existing datasets (e.g., Winogender).
  • Human-written datasets are provided for comparative analysis.
  • Data generation methodology is detailed in the linked paper.

Maintenance & Community

  • Developed by Anthropic.
  • Contact for questions: ethan@anthropic.com.
  • BibTeX citation provided for the associated paper.

Licensing & Compatibility

  • Copyright: arXiv.org perpetual, non-exclusive license.
  • Compatibility for commercial use or closed-source linking is not explicitly stated but implied by the arXiv license.

Limitations & Caveats

Some datasets contain social biases, stereotypes, and potentially harmful or offensive content, reflecting the nature of the data generation process and the views expressed within it.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
2 more.

ama_prompting by HazyResearch

0%
547
Language model prompting strategy research paper
Created 3 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luca Soldaini Luca Soldaini(Research Scientist at Ai2), and
7 more.

hh-rlhf by anthropics

0.2%
2k
RLHF dataset for training safe AI assistants
Created 3 years ago
Updated 3 months ago
Starred by Pietro Schirano Pietro Schirano(Founder of MagicPath), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

CL4R1T4S by elder-plinius

2.4%
10k
Dataset of system prompts for major AI models + agents
Created 6 months ago
Updated 3 days ago
Feedback? Help us improve.