bert-embedding by imgarylai

Deprecated tool for generating token-level embeddings from BERT models

Created 7 years ago

451 stars

Top 66.7% on SourcePulse

Project Summary

This project provides token-level embeddings from BERT models using MXNet and GluonNLP, targeting NLP researchers and developers who want to leverage pre-trained language representations without full end-to-end model fine-tuning. It offers a simpler way to integrate BERT's powerful contextual embeddings into existing NLP pipelines.

How It Works

The library extracts token embeddings from pre-trained BERT models. It leverages the MXNet deep learning framework and the GluonNLP toolkit for model loading and inference. Users can specify different pre-trained BERT models (e.g., bert_12_768_12, bert_24_1024_16) and handle Out-Of-Vocabulary (OOV) tokens using averaging, summation, or the last token's embedding.

Quick Start & Requirements

Install: pip install bert-embedding
GPU Support: Requires mxnet-cu92 (or compatible MXNet GPU version).
Usage: Instantiate BertEmbedding and call it with a list of sentences. GPU usage requires setting the MXNet context.
Documentation: README

Highlighted Details

Supports multiple pre-trained BERT models, including uncased and cased versions from book corpus and Wikipedia.
Offers flexibility in handling OOV tokens ('avg', 'sum', 'last').
Embeddings are 768-dimensional for standard models.

Maintenance & Community

The project is marked as deprecated by the author due to lack of maintenance time. The author is open to contributions from interested maintainers.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The project is deprecated and no longer actively maintained. The specific MXNet GPU version (mxnet-cu92) might be outdated, potentially requiring manual dependency management for newer CUDA versions.

bert-embedding by imgarylai

Explore Similar Projects

PERT by ymcui

rust-tokenizers by guillaume-be

nlp_made_easy by Kyubyong

kb by allenai

ConvBert by yitu-opensource

bert-japanese by cl-tohoku

bert_ner by Kyubyong

BERT-for-Sequence-Labeling-and-Text-Classification by yuanxiaosc

bert-for-tf2 by kpe

awesome-bert by Jiakui

bert-utils by terrifyzhao

bert by google-research