ConSERT by yym6472

Research paper code for contrastive self-supervised sentence representation transfer

Created 4 years ago

542 stars

Top 58.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

ConSERT is a contrastive learning framework designed to improve sentence representations derived from pre-trained language models like BERT. It addresses the issue of representation collapse, which hinders performance on Semantic Textual Similarity (STS) tasks, by fine-tuning models using unlabeled text. This framework is beneficial for researchers and practitioners in NLP seeking robust and transferable sentence embeddings.

How It Works

ConSERT employs a contrastive learning objective to fine-tune BERT models. It leverages unlabeled text data to push semantically similar sentences closer in the embedding space and dissimilar sentences further apart. The sentence representations are generated by averaging token embeddings from the last two layers of the BERT model. This approach effectively mitigates the collapse problem, leading to improved performance on downstream tasks, particularly STS.

Quick Start & Requirements

Installation: Requires torch==1.6.0, cudatoolkit==10.0.103, cudnn==7.6.5, sentence-transformers==0.3.9, transformers==3.4.0, apex==0.1.0. Apex needs to be cloned and installed separately.
Data: Download pre-trained models (e.g., bert-base-uncased) and STS datasets (English and Chinese) using provided scripts.
Execution: Run training scripts from the ./scripts directory (e.g., bash scripts/unsup-consert-base.sh).
Resources: Requires GPU with CUDA 10.0.103. Large models may require significant GPU memory (e.g., RTX 3090 with reduced max_seq_length).

Highlighted Details

Achieves state-of-the-art results on English STS tasks, outperforming previous methods by up to 8% relative improvement.
Demonstrates strong performance even in low-data scenarios (e.g., with only 1000 samples).
Supports both unsupervised and supervised fine-tuning strategies.
Provides implementations for both English and Chinese STS tasks using BERT and RoBERTa models.

Maintenance & Community

The project is associated with the ACL 2021 paper "ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer." No specific community channels or active maintenance indicators are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes related to the ACL 2021 paper. Commercial use or linking with closed-source projects would require clarification on licensing.

Limitations & Caveats

The provided results for large models may differ slightly from the paper due to updated PyTorch/CUDA versions and adjusted max_seq_length. The project's dependencies are specific (e.g., older PyTorch and CUDA versions), which might pose compatibility challenges with newer environments.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days