Awesome-Dataset-Distillation  by Guang000

Curated list of dataset distillation papers and applications

created 3 years ago
1,744 stars

Top 25.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated list of research papers, code, and applications related to dataset distillation. It targets researchers and practitioners in machine learning who are interested in synthesizing smaller, representative datasets from larger ones to train models efficiently, with benefits for privacy, continual learning, and more.

How It Works

Dataset distillation aims to create a small synthetic dataset that allows models trained on it to perform comparably to models trained on the original, large dataset. This is achieved through various techniques, including gradient matching, trajectory matching, surrogate objectives, generative models, and parameterization of the distilled dataset. These methods aim to capture the essential information from the original data distribution to facilitate effective model training.

Quick Start & Requirements

This repository is a curated list and does not have a direct installation or execution command. It links to various research papers, many of which include code repositories with their own specific requirements (e.g., Python, PyTorch, TensorFlow, specific hardware like GPUs).

Highlighted Details

  • Extensive categorization of papers across core techniques (gradient matching, generative distillation, etc.) and diverse applications (continual learning, privacy, medical imaging, GNNs, etc.).
  • Regularly updated with recent research, including papers from top-tier conferences like CVPR, NeurIPS, ICML, and ICLR.
  • Includes links to code repositories for many papers, enabling practical experimentation.
  • Features dedicated sections for benchmarks, surveys, challenges, and Ph.D. theses in the field.

Maintenance & Community

The project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang, with significant contributions acknowledged from numerous researchers in the field. It provides links to GitHub repositories for code and encourages pull requests for new submissions.

Licensing & Compatibility

The repository itself is typically licensed under permissive terms (e.g., MIT, as indicated by the GitHub octocat icon often associated with code). However, the licensing of individual linked code repositories varies and must be checked separately.

Limitations & Caveats

As a curated list, this repository does not provide a unified framework or implementation. Users must refer to individual linked papers and their associated code for specific functionalities, dependencies, and potential limitations. The field is rapidly evolving, with new methods and challenges emerging frequently.

Health Check
Last commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
126 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Andrey Vasnetsov Andrey Vasnetsov(Cofounder of Qdrant).

awesome-knowledge-distillation by dkozlov

0.1%
4k
Collection of knowledge distillation resources
created 8 years ago
updated 1 month ago
Feedback? Help us improve.