Curated list of data-centric AI resources
Top 35.0% on sourcepulse
This repository is a curated, though incomplete, list of resources for Data-Centric AI (DCAI), targeting researchers and practitioners interested in improving AI systems by focusing on data quality and engineering. It provides a structured overview of key concepts, techniques, and relevant papers across various stages of the AI lifecycle, from data collection and labeling to inference and maintenance.
How It Works
The project categorizes DCAI efforts into three main goals: training data development, inference data development, and data maintenance. Each goal is further broken down into sub-goals, with extensive lists of academic papers and code repositories linked for each. This structure allows users to explore specific areas of DCAI, such as data augmentation, prompt engineering, or out-of-distribution evaluation, and discover relevant research and tools.
Quick Start & Requirements
This repository is a curated list of resources, not a software package. No installation or execution is required. The primary purpose is to serve as a reference and starting point for exploring the field of Data-Centric AI.
Highlighted Details
Maintenance & Community
The list is actively curated, with contributions welcomed via pull requests. Contact information for the primary author is provided for direct contributions or inquiries. Community discussion is encouraged through Slack, QQ, and WeChat groups.
Licensing & Compatibility
The repository itself is not licensed as software. The linked papers and code repositories are subject to their respective licenses. Compatibility for commercial use or closed-source linking depends entirely on the licenses of the individual linked resources.
Limitations & Caveats
The README explicitly states that the list is "incomplete" and "unfeasible to encompass every paper," indicating a selective curation process. While extensive, users should be aware that it may not cover all emerging or niche areas within DCAI.
1 year ago
Inactive