ChineseTextClassifier  by ami66

Chinese short text classifier for sentiment analysis

created 6 years ago
364 stars

Top 78.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a Chinese text classifier for short product reviews, primarily for sentiment analysis. It offers a range of models achieving over 90% accuracy, targeting developers and researchers working with Chinese e-commerce data.

How It Works

The classifier implements several deep learning architectures, including Transformer, word2vec combined with TextCNN, FastText, and recurrent networks (LSTM/GRU) with Attention mechanisms. This multi-model approach allows for flexibility and comparison, with word embeddings pre-trained on a large dataset to capture semantic meaning.

Quick Start & Requirements

  • Install via pip install tensorflow==2.0.
  • Requires Python 3.
  • Dataset: 100,000京东 (JD.com) product reviews (data/goods_zh.txt), labeled as 0 (negative) or 1 (positive).

Highlighted Details

  • Achieves >90% accuracy across implemented models.
  • Models include Transformer, word2vec+TextCNN, FastText, word2vec+LSTM/GRU, word2vec+LSTM/GRU+Attention, and word2vec+Bi_LSTM+Attention.
  • Future improvements planned for GloVe, GPT, BERT, and ERNIE.

Maintenance & Community

  • Project maintained by ami66.
  • WeChat public account ID: datanlp for more ML/DL project knowledge.

Licensing & Compatibility

  • License not specified in the README.
  • Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The project is built on TensorFlow 2.0, which is an older version. The README does not specify the license, which may impact commercial use.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.