BDCI_Car_2018  by yilifzf

Solution for sentiment analysis and topic recognition

created 6 years ago
430 stars

Top 70.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the winning solution for the BDCI 2018 Automotive Industry User Opinion Topic and Sentiment Recognition competition. It offers a pipeline for aspect-based sentiment analysis, targeting researchers and practitioners in NLP and sentiment analysis. The solution achieves high accuracy by ensembling multiple deep learning models and fine-tuned BERT for both topic classification and sentiment polarity prediction.

How It Works

The approach employs a two-stage pipeline: first, topic classification (multi-label) using Binary Cross-Entropy, followed by sentiment polarity prediction (multi-class) conditioned on the predicted topic. For topic classification, nine models (8 diverse deep learning models with different embeddings and 1 fine-tuned BERT) are ensembled via stacking with Logistic Regression. Sentiment analysis utilizes 13 models (3 novel network designs with 4 embeddings each, plus fine-tuned BERT), also stacked with LR. This ensemble strategy leverages the strengths of various architectures and embeddings to maximize performance.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (requirements not explicitly listed, but implied by imports like skmulti-learn, tqdm, hanlp).
  • Prerequisites: Python 3.5+, PyTorch 0.4.*. GPU with at least 8GB VRAM is recommended for fine-tuning BERT. Pre-trained models and embeddings are available via BaiduYun (link and extraction code provided).
  • Setup: Pre-processed data is provided, allowing users to skip data cleaning and word vector preparation. The "One Step" instructions detail how to run stacking directly on provided pre-computed predictions, estimated to take minutes. Full training from scratch is time-intensive.
  • Links: BaiduYun Pre-trained Models (Extraction code: 47e7)

Highlighted Details

  • Utilizes a pipeline of topic classification and sentiment polarity prediction.
  • Employs stacking ensembling with Logistic Regression for both tasks.
  • Integrates fine-tuned Chinese BERT alongside CNN, AttA3, AT_LSTM, HEAT, and GCAE models.
  • Supports multiple word embeddings (merge, fastText, Tencent AI Lab, ELMo).

Maintenance & Community

Contact: sqfzf69(At)163.com. The README mentions potential future updates for code optimization and BERT compatibility.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for competition purposes; commercial use or integration into closed-source projects would require clarification.

Limitations & Caveats

The code is noted as not optimized and may contain imperfections (e.g., lack of batching in some networks). Compatibility issues with newer Hugging Face BERT conversion scripts are highlighted, recommending the use of provided or older conversion scripts. The solution does not handle "UNK" tokens, requiring modifications for real-world applications. Pre-trained model loading is tested on GPU only.

Health Check
Last commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.