Model-written datasets for evaluating language model behaviors
Top 92.5% on sourcepulse
This repository provides datasets generated by language models for evaluating AI behaviors, targeting researchers and developers interested in model-generated data quality and specific AI characteristics. The datasets enable evaluation of models for persona consistency, sycophancy, advanced AI risks, and gender bias, offering insights into model behavior beyond standard benchmarks.
How It Works
The datasets are designed to probe dialogue agents and other language models through carefully crafted prompts. They are generated using a few-shot approach, aiming to elicit specific behaviors related to persona, agreement, and AI safety concerns. The inclusion of human-written datasets allows for direct comparison and validation of model-generated evaluation quality.
Quick Start & Requirements
persona/
, sycophancy/
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Some datasets contain social biases, stereotypes, and potentially harmful or offensive content, reflecting the nature of the data generation process and the views expressed within it.
1 year ago
Inactive