text_classification  by brightmart

Text classification models using deep learning

Created 8 years ago
7,933 stars

Top 6.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive collection of deep learning models for text classification, targeting NLP researchers and practitioners. It offers implementations of various classic and state-of-the-art architectures, enabling users to explore, benchmark, and apply them to their own datasets for tasks like sentiment analysis and multi-label classification.

How It Works

The project implements a wide array of text classification models, including fastText, TextCNN, RNNs, RCNNs, Hierarchical Attention Networks, Seq2Seq with attention, Transformers, Dynamic Memory Networks, and Entity Networks. It supports multi-label classification and offers ensemble methods like boosting. The models are designed to be independent of the dataset, with a focus on providing baseline implementations and exploring different architectural choices for language understanding.

Quick Start & Requirements

  • Install: TensorFlow 1.8+ (compatible with 1.1-1.13), Python 2.7+ (Python 3.6+ with minor adjustments).
  • Data: Pre-processed cached data (1.8GB zip) is available for quick setup, requiring ~8GB RAM during training. Alternatively, custom data can be pre-processed using provided Jupyter notebooks.
  • Links: NLP API Demo, CLUE benchmark

Highlighted Details

  • Implements 13 distinct text classification models, including BERT and Transformer.
  • Provides performance benchmarks on a multi-label prediction task, showing scores and training times.
  • Includes implementations for sequence-to-sequence tasks and memory networks capable of transitive inference.
  • Offers a boosting ensemble method to improve model performance by stacking identical models.

Maintenance & Community

Licensing & Compatibility

  • The repository does not explicitly state a license. TensorFlow 1.8 is a key dependency.

Limitations & Caveats

  • Primarily developed for Python 2.7, with Python 3 compatibility requiring minor code adjustments.
  • Some models are described as "simple" and may not achieve top-tier performance without further tuning.
  • BERT implementation notes suggest potential memory constraints on standard GPUs for longer sequences.
Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
17 more.

pytext by facebookresearch

0%
6k
NLP framework (deprecated, migrate to torchtext)
Created 7 years ago
Updated 2 years ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Eugene Yan Eugene Yan(AI Scientist at AWS), and
14 more.

text by pytorch

0.0%
4k
PyTorch library for NLP tasks
Created 8 years ago
Updated 1 week ago
Feedback? Help us improve.