text_classification by brightmart

Text classification models using deep learning

Created 8 years ago

7,953 stars

Top 6.5% on SourcePulse

View on GitHub

7 Experts Love This Project

Cofounder of Lightning AI

Jinze Bai

Research Scientist at Alibaba Qwen

and 3 more!

Project Summary

This repository provides a comprehensive collection of deep learning models for text classification, targeting NLP researchers and practitioners. It offers implementations of various classic and state-of-the-art architectures, enabling users to explore, benchmark, and apply them to their own datasets for tasks like sentiment analysis and multi-label classification.

How It Works

The project implements a wide array of text classification models, including fastText, TextCNN, RNNs, RCNNs, Hierarchical Attention Networks, Seq2Seq with attention, Transformers, Dynamic Memory Networks, and Entity Networks. It supports multi-label classification and offers ensemble methods like boosting. The models are designed to be independent of the dataset, with a focus on providing baseline implementations and exploring different architectural choices for language understanding.

Quick Start & Requirements

Install: TensorFlow 1.8+ (compatible with 1.1-1.13), Python 2.7+ (Python 3.6+ with minor adjustments).
Data: Pre-processed cached data (1.8GB zip) is available for quick setup, requiring ~8GB RAM during training. Alternatively, custom data can be pre-processed using provided Jupyter notebooks.
Links: NLP API Demo, CLUE benchmark

Highlighted Details

Implements 13 distinct text classification models, including BERT and Transformer.
Provides performance benchmarks on a multi-label prediction task, showing scores and training times.
Includes implementations for sequence-to-sequence tasks and memory networks capable of transitive inference.
Offers a boosting ensemble method to improve model performance by stacking identical models.

Maintenance & Community

Contact: brightmart@hotmail.com
QQ Group: 836811304 for ML/DL/NLP discussions.

Licensing & Compatibility

The repository does not explicitly state a license. TensorFlow 1.8 is a key dependency.

Limitations & Caveats

Primarily developed for Python 2.7, with Python 3 compatibility requiring minor code adjustments.
Some models are described as "simple" and may not achieve top-tier performance without further tuning.
BERT implementation notes suggest potential memory constraints on standard GPUs for longer sequences.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days