BERT-for-Sequence-Labeling-and-Text-Classification by yuanxiaosc

Template code for BERT-based sequence labeling and text classification

Created 6 years ago

470 stars

Top 64.7% on SourcePulse

Project Summary

This repository provides a template for applying BERT models to sequence labeling and text classification tasks, specifically targeting named entity recognition (NER) and joint intent/slot filling. It's designed for NLP researchers and practitioners looking to leverage BERT for custom datasets and tasks.

How It Works

The project adapts Google's BERT implementation for sequence labeling and text classification. It includes specific scripts (run_sequence_labeling.py, run_text_classification.py, run_sequence_labeling_and_text_classification.py) to handle different task configurations. The approach involves fine-tuning a pre-trained BERT model on task-specific datasets, offering a structured way to integrate BERT's powerful contextual embeddings into downstream NLP applications.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.6+, TensorFlow 1.12.0+, scikit-learn. Requires downloading Google's BERT pre-trained models and placing them in the pretrained_model directory.
Setup: Requires downloading pre-trained models and potentially task-specific datasets.
Docs: predefined_task_usage.md

Highlighted Details

Supports CoNLL-2003 NER (eval_f=0.926), ATIS joint slot filling and intent prediction (Intent Acc=0.976, Slot Acc=0.955), and Snips datasets.
Provides clear instructions and code examples for adding new tasks by implementing a DataProcessor.
Includes scripts for both training/fine-tuning and prediction.
Offers pre-trained model checkpoints for specific tasks.

Maintenance & Community

No explicit information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project uses Google's BERT code, which is typically Apache 2.0 licensed, but this specific adaptation's licensing is unclear. Compatibility for commercial use is not specified.

Limitations & Caveats

The project relies on TensorFlow 1.x, which is deprecated. The README mentions that model scores are without careful parameter adjustment, implying potential for improvement. The provided download link for fine-tuned models is a Baidu Pan link, which may have regional access limitations.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days