data-centric-AI  by daochenzha

Curated list of data-centric AI resources

created 2 years ago
1,117 stars

Top 35.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated, though incomplete, list of resources for Data-Centric AI (DCAI), targeting researchers and practitioners interested in improving AI systems by focusing on data quality and engineering. It provides a structured overview of key concepts, techniques, and relevant papers across various stages of the AI lifecycle, from data collection and labeling to inference and maintenance.

How It Works

The project categorizes DCAI efforts into three main goals: training data development, inference data development, and data maintenance. Each goal is further broken down into sub-goals, with extensive lists of academic papers and code repositories linked for each. This structure allows users to explore specific areas of DCAI, such as data augmentation, prompt engineering, or out-of-distribution evaluation, and discover relevant research and tools.

Quick Start & Requirements

This repository is a curated list of resources, not a software package. No installation or execution is required. The primary purpose is to serve as a reference and starting point for exploring the field of Data-Centric AI.

Highlighted Details

  • Comprehensive categorization of DCAI techniques, from data collection and labeling to inference and maintenance.
  • Extensive lists of linked academic papers and code repositories for each sub-topic.
  • Includes links to survey papers, tutorials, and blog posts for a broader understanding of DCAI.
  • Highlights specific benchmarks and frameworks like OpenGSL and DataPerf.

Maintenance & Community

The list is actively curated, with contributions welcomed via pull requests. Contact information for the primary author is provided for direct contributions or inquiries. Community discussion is encouraged through Slack, QQ, and WeChat groups.

Licensing & Compatibility

The repository itself is not licensed as software. The linked papers and code repositories are subject to their respective licenses. Compatibility for commercial use or closed-source linking depends entirely on the licenses of the individual linked resources.

Limitations & Caveats

The README explicitly states that the list is "incomplete" and "unfeasible to encompass every paper," indicating a selective curation process. While extensive, users should be aware that it may not cover all emerging or niche areas within DCAI.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.