dcai-lab by dcai-course

DCAI course labs

Created 3 years ago

480 stars

Top 63.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Elvis Saravia

Founder of DAIR.AI

Project Summary

This repository provides lab assignments for the Introduction to Data-Centric AI course at MIT. It offers practical exercises for students and researchers to explore techniques for improving machine learning model performance by focusing on data quality, labeling, and curation, rather than solely on model architecture.

How It Works

The labs cover a range of data-centric AI methodologies, including comparing data-centric vs. model-centric approaches, identifying label errors using Confident Learning, dataset curation with multiple annotators, data-centric evaluation, handling class imbalance and outliers, active learning for dataset growth, interpretability for data analysis, prompt engineering for LLMs, and data privacy via membership inference attacks. Each lab provides a focused practical application of these concepts.

Quick Start & Requirements

Labs are designed to be run in a Python environment. Specific instructions for each lab may vary, but generally involve cloning the repository and installing dependencies.
Some labs, like Lab 8, are available on Google Colab for easier execution.
Prerequisites typically include Python and standard data science libraries (e.g., NumPy, Pandas, Scikit-learn). Specific labs might require additional libraries or pre-trained models.

Highlighted Details

Covers a comprehensive curriculum of data-centric AI techniques.
Includes practical implementation of concepts like Confident Learning and membership inference attacks.
Offers labs focused on modern AI challenges such as prompt engineering for LLMs and data privacy.
Provides a 2023 version of the labs for historical comparison or alternative learning paths.

Maintenance & Community

The repository is maintained by the instructors of the Introduction to Data-Centric AI course. Contributions are welcomed via issues and pull requests.

Licensing & Compatibility

Licensed under the GNU Affero General Public License v3.0 or later. This is a strong copyleft license, meaning modifications and derivative works must also be made available under the AGPL. Commercial use or linking with closed-source software may be restricted due to the AGPL's requirements.

Limitations & Caveats

The labs are educational materials and may not be production-ready. Specific dependencies and execution environments might require careful setup. The AGPL license imposes significant obligations on redistribution and modification, which could be a barrier for some commercial applications.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days