BERT fine-tuning example for Chinese sentiment classification
Top 76.5% on sourcepulse
This repository provides a guide and code for fine-tuning BERT for Chinese sentiment classification. It is targeted at researchers and practitioners who want to adapt BERT for custom Chinese text classification tasks, offering a detailed walkthrough of the process.
How It Works
The project leverages Google's BERT architecture, separating the process into pre-training and fine-tuning. For custom tasks like Chinese sentiment classification, the core approach involves modifying the DataProcessor
class to handle dataset-specific input formats and labels. The fine-tuning process then uses run_classifier.py
with a pre-trained Chinese BERT model, converting data into TFRecord format for efficient input processing via TPUEstimator
.
Quick Start & Requirements
python3 run_classifier.py ...
(see README for full command)bert_model.ckpt
, vocab.txt
, bert_config.json
), Python 3.x, TensorFlow.Highlighted Details
DataProcessor
for text classification.get_train_examples
function for a train_sentiment.txt
file.create_model
for custom loss calculations or task-specific output handling (e.g., NER).TPUEstimator
to tf.estimator.Estimator
for GPU/CPU optimization and deployment.Maintenance & Community
This repository appears to be a personal project documenting a specific experiment. No information on active maintenance, community channels, or notable contributors is present in the README.
Licensing & Compatibility
The repository itself does not specify a license. It is based on Google's BERT code, which is typically released under permissive licenses like Apache 2.0, but this should be verified with the original BERT repository.
Limitations & Caveats
The project is presented as an experimental guide rather than a production-ready library. It relies on tf.contrib.tpu.TPUEstimator
, which may require significant refactoring for optimal performance on GPUs or for deployment outside of TPU environments. The README does not provide benchmarks or performance metrics.
6 years ago
Inactive