BERT_Chinese_Classification  by renxingkai

BERT fine-tuning example for Chinese sentiment classification

created 6 years ago
377 stars

Top 76.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a guide and code for fine-tuning BERT for Chinese sentiment classification. It is targeted at researchers and practitioners who want to adapt BERT for custom Chinese text classification tasks, offering a detailed walkthrough of the process.

How It Works

The project leverages Google's BERT architecture, separating the process into pre-training and fine-tuning. For custom tasks like Chinese sentiment classification, the core approach involves modifying the DataProcessor class to handle dataset-specific input formats and labels. The fine-tuning process then uses run_classifier.py with a pre-trained Chinese BERT model, converting data into TFRecord format for efficient input processing via TPUEstimator.

Quick Start & Requirements

  • Install/Run: python3 run_classifier.py ... (see README for full command)
  • Prerequisites: Pre-trained Chinese BERT model files (bert_model.ckpt, vocab.txt, bert_config.json), Python 3.x, TensorFlow.
  • Setup: Requires downloading and extracting the pre-trained model.
  • Links: Google's BERT GitHub (for original code and models)

Highlighted Details

  • Detailed steps for customizing DataProcessor for text classification.
  • Example of a custom get_train_examples function for a train_sentiment.txt file.
  • Guidance on modifying create_model for custom loss calculations or task-specific output handling (e.g., NER).
  • Discussion on adapting TPUEstimator to tf.estimator.Estimator for GPU/CPU optimization and deployment.

Maintenance & Community

This repository appears to be a personal project documenting a specific experiment. No information on active maintenance, community channels, or notable contributors is present in the README.

Licensing & Compatibility

The repository itself does not specify a license. It is based on Google's BERT code, which is typically released under permissive licenses like Apache 2.0, but this should be verified with the original BERT repository.

Limitations & Caveats

The project is presented as an experimental guide rather than a production-ready library. It relies on tf.contrib.tpu.TPUEstimator, which may require significant refactoring for optimal performance on GPUs or for deployment outside of TPU environments. The README does not provide benchmarks or performance metrics.

Health Check
Last commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.