QA matching competition code for question answering using BERT
Top 96.9% on sourcepulse
This repository contains code and techniques for achieving top performance in the CCF 2020 QA Matching competition. It targets NLP practitioners and researchers looking to improve question-answering systems, offering advanced methods beyond standard fine-tuning. The primary benefit is a proven path to state-of-the-art results on a challenging QA matching task.
How It Works
The project explores various advanced fine-tuning and training strategies for BERT-based models. Key techniques include post-training with masked language modeling (MLM) enhancements (whole word masking, dynamic masking, new word mining), incorporating external knowledge via embeddings, contrastive learning (self-supervised and supervised), self-distillation, and adversarial training. These methods aim to improve model robustness, generalization, and feature extraction for the specific task of matching questions to relevant answers.
Quick Start & Requirements
pip
.pair-post-training-wwm-sop.py
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README indicates that complex classification layers (CNN/RNN/DGCNN) added after post-training did not yield further improvements. Integrating external knowledge via word2vec embeddings also did not improve performance in experiments. The data augmentation strategy using pseudo-labeling requires careful filtering to avoid introducing errors.
4 years ago
Inactive