Curated list of dataset distillation papers and applications
Top 25.1% on sourcepulse
This repository serves as a comprehensive, curated list of research papers, code, and applications related to dataset distillation. It targets researchers and practitioners in machine learning who are interested in synthesizing smaller, representative datasets from larger ones to train models efficiently, with benefits for privacy, continual learning, and more.
How It Works
Dataset distillation aims to create a small synthetic dataset that allows models trained on it to perform comparably to models trained on the original, large dataset. This is achieved through various techniques, including gradient matching, trajectory matching, surrogate objectives, generative models, and parameterization of the distilled dataset. These methods aim to capture the essential information from the original data distribution to facilitate effective model training.
Quick Start & Requirements
This repository is a curated list and does not have a direct installation or execution command. It links to various research papers, many of which include code repositories with their own specific requirements (e.g., Python, PyTorch, TensorFlow, specific hardware like GPUs).
Highlighted Details
Maintenance & Community
The project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang, with significant contributions acknowledged from numerous researchers in the field. It provides links to GitHub repositories for code and encourages pull requests for new submissions.
Licensing & Compatibility
The repository itself is typically licensed under permissive terms (e.g., MIT, as indicated by the GitHub octocat icon often associated with code). However, the licensing of individual linked code repositories varies and must be checked separately.
Limitations & Caveats
As a curated list, this repository does not provide a unified framework or implementation. Users must refer to individual linked papers and their associated code for specific functionalities, dependencies, and potential limitations. The field is rapidly evolving, with new methods and challenges emerging frequently.
4 days ago
1 day