NL-Augmenter  by GEM-benchmark

Framework for natural language dataset augmentation

created 4 years ago
787 stars

Top 45.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

NL-Augmenter is a collaborative repository providing a framework for augmenting text datasets with diverse natural language transformations. It targets researchers and practitioners in NLP who need to enhance datasets for tasks like style transfer, paraphrasing, and data randomization. The project aims to foster community contributions of novel augmentation techniques.

How It Works

The framework is built around a Python library that allows users to define and apply text transformations. Users can create new transformations by copying existing examples, implementing a generate method within a transformation.py file, and defining test cases in test.json. The project encourages contributions via pull requests, with a focus on novel and creative augmentation methods.

Quick Start & Requirements

  • Install:
    git clone https://github.com/GEM-benchmark/NL-Augmenter.git
    cd NL-Augmenter
    python setup.py sdist
    pip install -e .
    pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
    
  • Requirements: Python 3.7.
  • Demo: A Colab notebook is available for quick experimentation.

Highlighted Details

  • Supports a wide range of text augmentation techniques, including randomization, style/syntax changes, and paraphrasing.
  • Encourages community contributions through a structured pull request process.
  • Features a code styling standard enforced by black and pre-commit hooks.
  • Recognizes creative implementations with featured spots on the README and webpage.

Maintenance & Community

The project is a collaborative effort with a public Google Groups email for contact. It is associated with the GEM benchmark initiative.

Licensing & Compatibility

The primary license is not explicitly stated in the provided text, but it mentions that "Some transformations include components released under a different (permissive, open source) license." Users are advised to refer to individual transformation directories for specific license details.

Limitations & Caveats

The project requires Python 3.7, which is now end-of-life. Specific license details for the core framework are not immediately clear from the README, necessitating a review of individual transformation directories.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.