Discover and explore top open-source AI tools and projects—updated daily.
Survey of LLMs for data annotation and synthesis
Top 54.3% on SourcePulse
This repository is a curated survey of research papers focused on leveraging Large Language Models (LLMs) for data annotation and synthesis. It serves as a comprehensive resource for researchers and practitioners in Natural Language Processing (NLP) and Machine Learning (ML) interested in automated data generation, dataset creation, and improving model performance through synthetic data.
How It Works
The repository functions as a dynamic, community-driven bibliography. It compiles and categorizes academic papers, providing links to their sources (primarily arXiv preprints). The content is updated regularly, reflecting the rapid advancements in LLM applications for data annotation and synthesis, with a focus on various techniques like Chain-of-Thought, self-training, and preference optimization.
Quick Start & Requirements
This repository is a collection of research papers and does not involve direct code execution or installation. Users can access the compiled list of papers and datasets via the provided links.
Highlighted Details
Maintenance & Community
The repository is maintained by Dawei Li and welcomes contributions via Pull Requests. Users can cite the associated survey paper for its utility.
Licensing & Compatibility
The repository itself does not have a specific license mentioned, as it is a curated list of external research papers. The licensing of individual papers would depend on their respective publication venues.
Limitations & Caveats
This repository is a bibliography and does not provide code or tools for implementing the discussed techniques. Users must refer to the individual papers for implementation details and potential limitations of specific methods.
1 month ago
1 week