AzureML-BERT  by microsoft

End-to-end recipes for BERT pre-training and fine-tuning

created 6 years ago
398 stars

Top 73.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides end-to-end recipes for pre-training and fine-tuning BERT models using Azure Machine Learning Service. It targets NLP researchers and engineers who need to build custom language representation models on domain-specific data or adapt existing BERT models for specialized tasks, offering a stable and predictable workflow for large-scale distributed training.

How It Works

The pre-training recipe leverages PyTorch and Hugging Face's BERT v0.6.2, incorporating optimization techniques like gradient accumulation and mixed-precision training to handle large models and datasets efficiently. The fine-tuning recipe demonstrates adapting pre-trained checkpoints to downstream tasks, specifically evaluating against the GLUE benchmark using Azure ML. This approach aims to simplify the complex process of distributed training and model configuration for large language models.

Quick Start & Requirements

Highlighted Details

  • End-to-end recipes for both pre-training from scratch and fine-tuning BERT.
  • Includes data preprocessing scripts for repeatability and custom corpus usage.
  • Supports distributed training with gradient accumulation and mixed-precision.
  • Provides notebooks for evaluating against the GLUE benchmark.

Maintenance & Community

This repository is from Microsoft. A note from 7/7/2020 indicates a more recent and significantly faster implementation for BERT pretraining is available using ONNX Runtime.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text. Compatibility for commercial use or closed-source linking would depend on the specific license chosen for the repository.

Limitations & Caveats

The README explicitly points to a more recent, significantly faster implementation using ONNX Runtime for BERT pretraining, suggesting this repository's pretraining recipe may be outdated or less performant. The setup requires an Azure ML service account and potentially substantial GPU resources.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Abhishek Thakur Abhishek Thakur(World's First 4x Kaggle GrandMaster), and
5 more.

xlnet by zihangdai

0.0%
6k
Language model research paper using generalized autoregressive pretraining
created 6 years ago
updated 2 years ago
Feedback? Help us improve.