Discover and explore top open-source AI tools and projects—updated daily.
Research paper for improving BERT sentence embeddings
Top 81.3% on SourcePulse
PromptBERT offers a novel contrastive learning method to enhance sentence embeddings generated by BERT and similar models. It addresses limitations in static token embeddings and ineffective layer usage by introducing prompt-based techniques. This is beneficial for researchers and practitioners seeking improved sentence representation quality, particularly in unsupervised settings.
How It Works
PromptBERT employs a prompt-based approach to guide BERT's representation learning. It explores different prompt representation and searching methods to mitigate static embedding biases. A key innovation is an unsupervised training objective using template denoising, significantly closing the gap between supervised and unsupervised performance. This method aims to improve sentence embedding quality by making BERT more sensitive to contextual nuances.
Quick Start & Requirements
pip install -r requirements.txt
sh SentEval/data/downstream/download_dataset.sh
, sh ./data/download_wiki.sh
, sh ./data/download_nli.sh
sh eval_only.sh [unsup-bert|unsup-roberta|sup-bert|sup-roberta]
Highlighted Details
Maintenance & Community
The project was last updated in August 2023 with an extension to LLMs in a separate repository (scaling_sentemb
). No specific community channels or active maintenance signals are provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. Given its basis on SimCSE, which is Apache 2.0 licensed, it is likely compatible with commercial use, but this should be verified.
Limitations & Caveats
The README does not detail specific limitations, known bugs, or deprecation status. The project's last update was in August 2023, and community engagement is not evident, which may indicate limited ongoing development.
1 year ago
1 day