DataAug4NLP by styfeng

NLP data augmentation paper collection

Created 5 years ago

834 stars

Top 41.9% on SourcePulse

Project Summary

This repository serves as a curated collection of research papers and resources focused on data augmentation techniques for Natural Language Processing (NLP) tasks. It aims to provide a comprehensive overview for researchers and practitioners looking to improve model performance and robustness through data augmentation, covering a wide spectrum of NLP applications.

How It Works

The repository organizes papers by specific NLP tasks such as text classification, machine translation, summarization, and question answering. Each entry typically links to the paper, relevant datasets used, and often includes code repositories. This structured approach allows users to quickly find relevant augmentation strategies and their empirical validation across different NLP domains.

Quick Start & Requirements

This repository is a collection of links and citations, not a runnable software package. No installation or specific requirements are needed to browse its contents.

Highlighted Details

Comprehensive categorization of data augmentation techniques across 15+ NLP task areas.
Links to over 100 research papers, many with associated code implementations.
Includes a foundational survey paper "A survey of data augmentation approaches in NLP (Findings of ACL '21)" for a structured understanding.
Highlights related popular resources like nlpaug, TextAttack, AugLy, and NL-Augmenter.

Maintenance & Community

The repository is based on the ACL '21 findings paper and is noted as a Work In Progress (WIP) with plans to add more papers. Inquiries can be directed via email or by opening issues. Talks and podcast episodes related to the work are also linked.

Licensing & Compatibility

The repository itself does not have a specific license mentioned, but it links to research papers which have their own licenses. Compatibility for commercial use would depend on the licenses of the individual papers and their associated code.

Limitations & Caveats

As a curated list of papers, this repository does not provide direct implementations or tools for data augmentation. Users will need to refer to the linked papers and their respective codebases to utilize the described techniques.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days