bert_seq2seq  by 920232796

PyTorch toolkit for sequence-to-sequence and other NLP tasks

Created 5 years ago
1,301 stars

Top 30.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a lightweight PyTorch framework for various Natural Language Processing (NLP) tasks, leveraging BERT and similar models. It targets researchers and developers needing a flexible tool for sequence-to-sequence generation (e.g., poetry, summarization), text classification, and sequence labeling (e.g., NER, POS tagging), with support for multiple pre-trained models like BERT, RoBERTa, GPT2, T5, and BART.

How It Works

The framework utilizes a unified approach where different NLP tasks are handled by configuring model architecture and task-specific heads on top of pre-trained encoder models. It supports various pre-trained models by loading their parameters, allowing users to switch between them by setting model_name. Task selection is managed via the model_class parameter, enabling tasks like seq2seq, cls_classifier, sequence_labeling, and sequence_labeling_crf. This modular design simplifies experimentation with different models and tasks.

Quick Start & Requirements

  • Install via pip: pip install bert-seq2seq tqdm
  • Requires PyTorch.
  • Pre-trained model weights need to be downloaded separately from provided links (e.g., Hugging Face, Baidu Pan).
  • Official examples demonstrate usage for specific tasks.

Highlighted Details

  • Supports a wide range of NLP tasks including poetry generation, couplet generation, automatic summarization, text classification, sentiment analysis, NER, POS tagging, and relation extraction.
  • Integrates with popular pre-trained models like BERT, RoBERTa, GPT2, T5, BART, and Huawei's Nezha.
  • Offers specific implementations like sequence labeling with CRF loss for improved performance.
  • Includes examples for SimBERT for sentence similarity tasks.

Maintenance & Community

  • Active development with frequent updates noted in the changelog (last update mentioned: Nov 12, 2021).
  • QQ group available for community discussion and support (975907202).
  • Links to personal blog for detailed explanations of tasks and code.

Licensing & Compatibility

  • The README does not explicitly state a license. Code snippets reference Hugging Face Transformers and bert4keras, which have permissive licenses. However, the absence of a clear license file requires caution for commercial use.

Limitations & Caveats

  • The project's last update was in late 2021, indicating potential lack of maintenance for newer models or techniques.
  • Pre-trained model weights must be manually downloaded and configured, adding an extra setup step.
  • Some specific features like rhyme enforcement in poetry generation were noted as temporarily unsupported in past updates.
Health Check
Last Commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Eugene Yan Eugene Yan(AI Scientist at AWS), and
14 more.

text by pytorch

0.0%
4k
PyTorch library for NLP tasks
Created 8 years ago
Updated 1 week ago
Feedback? Help us improve.