Pytorch-NLU  by yongzhuo

Pytorch toolkit for text classification, sequence labeling, and text summarization

created 3 years ago
348 stars

Top 80.9% on sourcepulse

GitHubView on GitHub
Project Summary

This toolkit provides a minimalist, PyTorch-based solution for Chinese Natural Language Understanding tasks, specifically text classification and sequence labeling. It supports a wide array of pre-trained models and loss functions, making it suitable for researchers and developers working with Chinese NLP data who need a flexible and well-annotated codebase.

How It Works

The library leverages the PyTorch ecosystem, integrating seamlessly with Hugging Face's transformers library to support models like BERT, ERNIE, RoBERTa, and others. It offers a variety of loss functions, including BCE, Focal Loss, Circle Loss, and Label Smoothing, allowing users to fine-tune model performance based on specific task requirements. The architecture is designed for simplicity, clarity, and ease of extension.

Quick Start & Requirements

  • Install via pip: pip install Pytorch-NLU or pip install -i https://pypi.tuna.tsinghua.edu.cn/simple Pytorch-NLU
  • Requires PyTorch, transformers, numpy, and tensorboardX.
  • Supports various pre-trained models, requiring download or local path configuration.

Highlighted Details

  • Supports 10+ pre-trained models including BERT, ERNIE, RoBERTa, ALBERT, XLNET, ELECTRA, GPT-2, TinyBERT, XLM, T5.
  • Implements 6+ loss functions such as BCE, Focal Loss, Circle Loss, Prior Loss, Dice Loss, and Label Smoothing.
  • Offers functionalities for multi-class, multi-label classification, Named Entity Recognition (NER), Part-of-Speech (POS) tagging, word segmentation, and extractive text summarization.
  • Provides extensive datasets for text classification and sequence labeling tasks.

Maintenance & Community

The project is maintained by Yongzhuo Mo. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not specify a license, which could be a barrier for commercial adoption. Some example configurations point to local Windows paths (D:/pretrain_models/pytorch), suggesting potential cross-platform setup nuances.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.