TensorFlow code for sentence embeddings research paper
Top 60.3% on sourcepulse
This repository provides a TensorFlow implementation of the EMNLP 2020 paper "On the Sentence Embeddings from Pre-trained Language Models." It offers a method to improve sentence embeddings derived from pre-trained language models like BERT, targeting researchers and practitioners in Natural Language Processing seeking enhanced semantic representation for sentences. The key benefit is achieving state-of-the-art performance on sentence similarity tasks.
How It Works
The project implements a "flow" mechanism, a generative model approach, to refine sentence embeddings. This involves fine-tuning pre-trained BERT models using Natural Language Inference (NLI) supervision. The core idea is to learn a transformation (the "flow") that maps BERT's raw sentence representations to a more semantically meaningful space, improving performance on tasks like semantic textual similarity (STS).
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is associated with authors from CMU. Contact information is provided for questions. No explicit community channels (like Discord/Slack) or roadmap are mentioned.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, it acknowledges borrowing heavily from projects like google-research/bert
, zihangdai/xlnet
, and tensorflow/tensor2tensor
, which have varying licenses. Compatibility for commercial use or closed-source linking would require clarification of the specific license applied to this codebase.
Limitations & Caveats
The implementation is specific to TensorFlow 1.x. The README does not detail support for newer TensorFlow versions or other frameworks like PyTorch. The setup involves manual downloading of large pre-trained models and datasets.
4 years ago
Inactive