Pytorch-NLU by yongzhuo

Pytorch toolkit for text classification, sequence labeling, and text summarization

Created 4 years ago

354 stars

Top 78.9% on SourcePulse

Project Summary

This toolkit provides a minimalist, PyTorch-based solution for Chinese Natural Language Understanding tasks, specifically text classification and sequence labeling. It supports a wide array of pre-trained models and loss functions, making it suitable for researchers and developers working with Chinese NLP data who need a flexible and well-annotated codebase.

How It Works

The library leverages the PyTorch ecosystem, integrating seamlessly with Hugging Face's transformers library to support models like BERT, ERNIE, RoBERTa, and others. It offers a variety of loss functions, including BCE, Focal Loss, Circle Loss, and Label Smoothing, allowing users to fine-tune model performance based on specific task requirements. The architecture is designed for simplicity, clarity, and ease of extension.

Quick Start & Requirements

Install via pip: pip install Pytorch-NLU or pip install -i https://pypi.tuna.tsinghua.edu.cn/simple Pytorch-NLU
Requires PyTorch, transformers, numpy, and tensorboardX.
Supports various pre-trained models, requiring download or local path configuration.

Highlighted Details

Supports 10+ pre-trained models including BERT, ERNIE, RoBERTa, ALBERT, XLNET, ELECTRA, GPT-2, TinyBERT, XLM, T5.
Implements 6+ loss functions such as BCE, Focal Loss, Circle Loss, Prior Loss, Dice Loss, and Label Smoothing.
Offers functionalities for multi-class, multi-label classification, Named Entity Recognition (NER), Part-of-Speech (POS) tagging, word segmentation, and extractive text summarization.
Provides extensive datasets for text classification and sequence labeling tasks.

Maintenance & Community

The project is maintained by Yongzhuo Mo. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not specify a license, which could be a barrier for commercial adoption. Some example configurations point to local Windows paths (D:/pretrain_models/pytorch), suggesting potential cross-platform setup nuances.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days