Preference dataset for training reward/critique models
Top 81.4% on sourcepulse
UltraFeedback provides a large-scale, fine-grained, and diverse preference dataset for training reward and critique models, targeting researchers in Reinforcement Learning from Human Feedback (RLHF). It aims to improve LLM alignment by offering detailed annotations and enabling the development of more capable feedback models.
How It Works
The dataset comprises 64k prompts from diverse sources, generating 256k responses from 17 different LLMs. Responses are annotated by GPT-4 across four dimensions: instruction-following, truthfulness, honesty, and helpfulness, providing both numerical ratings and textual rationales. This fine-grained approach allows for more nuanced reward model training compared to simpler preference datasets.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
While GPT-4 annotations are used, the project acknowledges that GPT-4 can still make mistakes, potentially impacting data quality. The dataset is primarily focused on single-turn interactions, with multi-round dialogue extensions planned.
1 year ago
1 week