text-classification-surveys by liqian-bio

Text classification resource survey, covering shallow/deep learning models

Created 5 years ago

610 stars

Top 53.7% on SourcePulse

Project Summary

This repository serves as a comprehensive survey and resource hub for text classification, targeting NLP researchers and practitioners. It consolidates papers, models, datasets, and evaluation metrics, primarily drawing from the survey paper "A Survey on Text Classification: From Shallow to Deep Learning," offering a structured overview of the field's evolution and key components.

How It Works

The repository categorizes text classification approaches from traditional shallow learning models (e.g., SVM, Random Forest) to state-of-the-art deep learning architectures (e.g., BERT, RoBERTa, TextGCN). It details various model architectures, their core mechanisms (like attention, graph convolutions, or span masking), and their performance on benchmark datasets, providing a historical and technical progression of the field.

Quick Start & Requirements

This repository is primarily a curated collection of information and links to external resources (papers, GitHub repositories). There is no direct installation or execution command for the repository itself. Users are directed to individual model repositories for setup and usage.

Highlighted Details

Extensive coverage of both shallow (LightGBM, SVM) and deep learning models (BERT, RoBERTa, XLNet, TextGCN, etc.).
Detailed listing and description of numerous text classification datasets (e.g., MR, SST, IMDB, 20NG, SQuAD, SNLI).
Comprehensive overview of evaluation metrics, including single-label (Accuracy, Precision, Recall, F1) and multi-label metrics (Micro-F1, Macro-F1, P@K).
Discussion of future research challenges, including zero-shot/few-shot learning, external knowledge integration, multi-label classification, domain-specific vocabulary, model interpretability, and robustness.

Maintenance & Community

The repository is marked as "updating," indicating ongoing curation. Specific contributors or community links (like Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository itself does not specify a license. However, it links to numerous external projects, each with its own license. Users must consult the licenses of the individual linked repositories for usage and compatibility.

Limitations & Caveats

This repository is a survey and does not provide executable code or pre-trained models directly. Users must navigate to linked external repositories for implementation details and usage. The "updating" status suggests potential for changes and additions.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days