nlp-notebook  by jasoncao11

NLP toolkit for common tasks, implemented in PyTorch

created 4 years ago
534 stars

Top 60.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides implementations for common Natural Language Processing (NLP) tasks, targeting researchers and practitioners in the field. It offers a comprehensive suite of tools for tasks like new word discovery, word embeddings, text classification, named entity recognition, text summarization, sentence similarity, relation extraction, and pre-trained models, all built with PyTorch.

How It Works

The project leverages PyTorch for its deep learning models, integrating libraries like torchtext, optuna for hyperparameter tuning, and transformers for advanced NLP architectures. It covers a wide range of established and modern NLP techniques, from traditional methods like Word2Vec and FastText to transformer-based approaches like BERT for tasks such as classification, NER, and summarization. The inclusion of Optuna for parameter optimization within text classification models is a key advantage for achieving better performance.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (specific command not provided, but implied)
  • Prerequisites: Python 3.7, PyTorch 1.8.0, Torchtext 0.9.1, Optuna 2.6.0, Transformers 3.0.2.
  • Dataset: A provided binary sentiment analysis dataset is included.

Highlighted Details

  • Implements 9 distinct text classification models, with internal hyperparameter tuning via Optuna.
  • Features multiple approaches for Named Entity Recognition (NER), including BERT-MRC, BERT-CRF, and BERT-MLM.
  • Offers both generative (Seq2seq, Transformer, GPT, BERT-seq2seq) and extractive (BERT-extractive-summarizer) methods for text summarization.
  • Includes implementations for pre-trained models like ELECTRA and SimCSE, and prompt learning techniques like P-tuning V1.

Maintenance & Community

No specific information on contributors, community channels, or roadmap is available in the README.

Licensing & Compatibility

The license is not specified in the README.

Limitations & Caveats

The project requires specific, potentially older versions of PyTorch (1.8.0) and Transformers (3.0.2), which may pose compatibility challenges with newer libraries or hardware. The README does not provide explicit instructions for running the code or setting up the environment beyond listing dependencies.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.