BDCI_Car_2018 by yilifzf

Solution for sentiment analysis and topic recognition

Created 7 years ago

432 stars

Top 68.7% on SourcePulse

Project Summary

This repository provides the winning solution for the BDCI 2018 Automotive Industry User Opinion Topic and Sentiment Recognition competition. It offers a pipeline for aspect-based sentiment analysis, targeting researchers and practitioners in NLP and sentiment analysis. The solution achieves high accuracy by ensembling multiple deep learning models and fine-tuned BERT for both topic classification and sentiment polarity prediction.

How It Works

The approach employs a two-stage pipeline: first, topic classification (multi-label) using Binary Cross-Entropy, followed by sentiment polarity prediction (multi-class) conditioned on the predicted topic. For topic classification, nine models (8 diverse deep learning models with different embeddings and 1 fine-tuned BERT) are ensembled via stacking with Logistic Regression. Sentiment analysis utilizes 13 models (3 novel network designs with 4 embeddings each, plus fine-tuned BERT), also stacked with LR. This ensemble strategy leverages the strengths of various architectures and embeddings to maximize performance.

Quick Start & Requirements

Install: pip install -r requirements.txt (requirements not explicitly listed, but implied by imports like skmulti-learn, tqdm, hanlp).
Prerequisites: Python 3.5+, PyTorch 0.4.*. GPU with at least 8GB VRAM is recommended for fine-tuning BERT. Pre-trained models and embeddings are available via BaiduYun (link and extraction code provided).
Setup: Pre-processed data is provided, allowing users to skip data cleaning and word vector preparation. The "One Step" instructions detail how to run stacking directly on provided pre-computed predictions, estimated to take minutes. Full training from scratch is time-intensive.
Links: BaiduYun Pre-trained Models (Extraction code: 47e7)

Highlighted Details

Utilizes a pipeline of topic classification and sentiment polarity prediction.
Employs stacking ensembling with Logistic Regression for both tasks.
Integrates fine-tuned Chinese BERT alongside CNN, AttA3, AT_LSTM, HEAT, and GCAE models.
Supports multiple word embeddings (merge, fastText, Tencent AI Lab, ELMo).

Maintenance & Community

Contact: sqfzf69(At)163.com. The README mentions potential future updates for code optimization and BERT compatibility.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for competition purposes; commercial use or integration into closed-source projects would require clarification.

Limitations & Caveats

The code is noted as not optimized and may contain imperfections (e.g., lack of batching in some networks). Compatibility issues with newer Hugging Face BERT conversion scripts are highlighted, recommending the use of provided or older conversion scripts. The solution does not handle "UNK" tokens, requiring modifications for real-world applications. Pre-trained model loading is tested on GPU only.

BDCI_Car_2018 by yilifzf

Explore Similar Projects

papernote by xwzhong

SentimentAnalysis by barissayil

nlp_notes by YangBin1729

100-Days-of-NLP by graviraja

ChineseTextClassifier by ami66

awesome-sentiment-analysis by xiamx

sentiment_analysis_fine_grain by brightmart

lightNLP by smilelight

generating-reviews-discovering-sentiment by openai

ABSA-PyTorch by songyouwei

NLP-Models-Tensorflow by mesolitica

BERTopic by MaartenGr