BERT utility for sentence embeddings, text classification, and similarity
Top 25.9% on sourcepulse
This repository provides simplified utilities for leveraging Google's BERT model for generating sentence embeddings and performing text classification. It is designed for developers and researchers looking for a streamlined way to integrate BERT's capabilities into their NLP pipelines, offering faster sentence vector generation and a straightforward fine-tuning process for classification tasks.
How It Works
The library builds upon Google's open-source BERT implementation, focusing on ease of use. For sentence embeddings, it optimizes the graph file generation process for faster startup times by caching the graph. For text classification, it facilitates fine-tuning using TensorFlow's Estimator API, requiring data to be formatted into train.csv
, dev.csv
, and test.csv
files.
Quick Start & Requirements
chinese_L-12_H-768_A-12.zip
from Google Storage.from bert.extrac_feature import BertVector; bv = BertVector(); bv.encode(['text'])
from similarity import BertSim; bs = BertSim(); bs.set_mode(...); bs.train()/eval()/test()
Highlighted Details
Maintenance & Community
Last updated July 1st, 2019. No community links or active maintenance signals are present in the README.
Licensing & Compatibility
The README does not specify a license. It is based on Google's BERT code, which is typically Apache 2.0 licensed, but this is not confirmed for this specific utility wrapper.
Limitations & Caveats
The project's last update was in 2019, suggesting it may not incorporate recent BERT advancements or address newer TensorFlow/Keras API changes. The lack of explicit licensing information could pose compatibility issues for commercial use.
5 years ago
Inactive