LLM4Annotation  by Zhen-Tan-dmml

Survey of LLMs for data annotation and synthesis

Created 1 year ago
602 stars

Top 54.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated survey of research papers focused on leveraging Large Language Models (LLMs) for data annotation and synthesis. It serves as a comprehensive resource for researchers and practitioners in Natural Language Processing (NLP) and Machine Learning (ML) interested in automated data generation, dataset creation, and improving model performance through synthetic data.

How It Works

The repository functions as a dynamic, community-driven bibliography. It compiles and categorizes academic papers, providing links to their sources (primarily arXiv preprints). The content is updated regularly, reflecting the rapid advancements in LLM applications for data annotation and synthesis, with a focus on various techniques like Chain-of-Thought, self-training, and preference optimization.

Quick Start & Requirements

This repository is a collection of research papers and does not involve direct code execution or installation. Users can access the compiled list of papers and datasets via the provided links.

Highlighted Details

  • Extensive collection of papers on LLM-based data annotation and synthesis, updated frequently.
  • Categorization of papers by specific techniques (e.g., Long-CoT Synthesis & Distillation, LLM-as-a-Judge).
  • Includes links to relevant datasets used in the research.
  • Complements an EMNLP 2024 oral survey paper on the same topic.

Maintenance & Community

The repository is maintained by Dawei Li and welcomes contributions via Pull Requests. Users can cite the associated survey paper for its utility.

Licensing & Compatibility

The repository itself does not have a specific license mentioned, as it is a curated list of external research papers. The licensing of individual papers would depend on their respective publication venues.

Limitations & Caveats

This repository is a bibliography and does not provide code or tools for implementing the discussed techniques. Users must refer to the individual papers for implementation details and potential limitations of specific methods.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.