dcai-lab  by dcai-course

DCAI course labs

created 2 years ago
469 stars

Top 65.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides lab assignments for the Introduction to Data-Centric AI course at MIT. It offers practical exercises for students and researchers to explore techniques for improving machine learning model performance by focusing on data quality, labeling, and curation, rather than solely on model architecture.

How It Works

The labs cover a range of data-centric AI methodologies, including comparing data-centric vs. model-centric approaches, identifying label errors using Confident Learning, dataset curation with multiple annotators, data-centric evaluation, handling class imbalance and outliers, active learning for dataset growth, interpretability for data analysis, prompt engineering for LLMs, and data privacy via membership inference attacks. Each lab provides a focused practical application of these concepts.

Quick Start & Requirements

  • Labs are designed to be run in a Python environment. Specific instructions for each lab may vary, but generally involve cloning the repository and installing dependencies.
  • Some labs, like Lab 8, are available on Google Colab for easier execution.
  • Prerequisites typically include Python and standard data science libraries (e.g., NumPy, Pandas, Scikit-learn). Specific labs might require additional libraries or pre-trained models.

Highlighted Details

  • Covers a comprehensive curriculum of data-centric AI techniques.
  • Includes practical implementation of concepts like Confident Learning and membership inference attacks.
  • Offers labs focused on modern AI challenges such as prompt engineering for LLMs and data privacy.
  • Provides a 2023 version of the labs for historical comparison or alternative learning paths.

Maintenance & Community

The repository is maintained by the instructors of the Introduction to Data-Centric AI course. Contributions are welcomed via issues and pull requests.

Licensing & Compatibility

Licensed under the GNU Affero General Public License v3.0 or later. This is a strong copyleft license, meaning modifications and derivative works must also be made available under the AGPL. Commercial use or linking with closed-source software may be restricted due to the AGPL's requirements.

Limitations & Caveats

The labs are educational materials and may not be production-ready. Specific dependencies and execution environments might require careful setup. The AGPL license imposes significant obligations on redistribution and modification, which could be a barrier for some commercial applications.

Health Check
Last commit

5 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.