Datasets for community-driven AI model training and evaluation
Top 98.2% on sourcepulse
This initiative empowers the open-source community to collaboratively build impactful datasets for AI models. It targets researchers and developers seeking high-quality, community-vetted datasets, offering curated resources and tools to facilitate data creation and annotation.
How It Works
The project comprises two main components: community efforts and cookbook efforts. Community efforts involve hands-on projects guided by Hugging Face, such as prompt ranking and image preference annotation, leveraging community participation to create large-scale datasets. Cookbook efforts provide standalone guides and tools for users to independently build domain-specific or preference-based datasets (DPO, ORPO, KTO).
Quick Start & Requirements
data-is-better-together/10k_prompts_ranked
, data-is-better-together/open-image-preferences-v1-binarized
).cookbook-efforts/domain-specific-datasets/README.md
).Highlighted Details
data-is-better-together/10k_prompts_ranked
with over 385 contributors.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
7 months ago
Inactive