deepnlp  by rockingdingo

NLP pipeline for Chinese and English, using TensorFlow

created 8 years ago
1,357 stars

Top 30.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a Deep Learning NLP Pipeline implemented in TensorFlow, offering a suite of tools for Chinese and English text processing. It targets developers and researchers needing foundational NLP capabilities like segmentation, POS tagging, NER, and dependency parsing, with the ability to train custom models.

How It Works

The pipeline leverages TensorFlow for its core NLP modules. Segmentation uses Linear Chain CRF via CRF++, while POS tagging and NER are implemented using LSTM/BI-LSTM/LSTM-CRF networks. Dependency parsing employs an Arc-Standard System with a Feed Forward Neural Network. The project also includes Seq2Seq-Attention for summarization and CNNs for document classification. Pre-trained models for Chinese are available, with support for English POS tagging.

Quick Start & Requirements

  • Install via pip: pip install deepnlp
  • Prerequisites: CRF++ (>=0.54), TensorFlow (1.4), Python (2.7, 3.6 tested).
  • Pre-trained models for English POS and domain-specific NER are not included in the PyPI package and must be downloaded separately.
  • Installation script for CRF++ is provided.
  • See deepnlp.org/api/v1.0/pipeline for API details.

Highlighted Details

  • Offers a comprehensive NLP pipeline including segmentation, POS tagging, NER, and dependency parsing.
  • Provides pre-trained models for Chinese text and supports training custom models for different languages.
  • Includes a free RESTful API for common NLP tasks.
  • Implements advanced models like Seq2Seq-Attention for summarization and CNNs for classification.

Maintenance & Community

The project's README states that the deepnlp library was archived by the end of 2020 and only supports TensorFlow up to version 1.13.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The project is archived and only supports older versions of TensorFlow (up to 1.13). Pre-trained models for English POS and domain-specific NER require manual download. The README mentions TextCNN is "WIP" (Work In Progress).

Health Check
Last commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.