mt-dnn by namisan

PyTorch package for multi-task deep neural networks research

Created 6 years ago

2,258 stars

Top 19.9% on SourcePulse

View on GitHub

4 Experts Love This Project

Forrest Iandola

Author of SqueezeNet; Research Scientist at Meta

Chris Van Pelt

Cofounder of Weights & Biases

Amanpreet Singh

Cofounder of Contextual AI

Thomas Wolf

Cofounder of Hugging Face

Project Summary

This PyTorch package implements Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding, targeting researchers and practitioners in NLP. It enables improved model performance and generalization by training a single model on multiple related tasks simultaneously, leveraging pre-trained language models like BERT.

How It Works

MT-DNN utilizes a shared encoder (typically BERT) with task-specific output layers. The core idea is to learn a unified representation that benefits from the diverse signals across multiple NLU tasks. This approach aims to improve generalization and robustness compared to training single-task models, as demonstrated in various ACL and arXiv publications by Microsoft researchers.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.6, PyTorch. Docker image allenlao/pytorch-mt-dnn:v1.3 is available.
Data: Requires downloading GLUE benchmark datasets.
Resources: Experiments cited used 4-8 V100 GPUs.
Docs: https://www.python.org/downloads/release/python-360/ (for Python 3.6), https://gluebenchmark.com/ (for GLUE data).

Highlighted Details

Supports fine-tuning pre-trained BERT models for multi-task learning.
Includes scripts for reproducing GLUE benchmark results and domain adaptation tasks (SciTail, SNLI).
Offers features like SMART regularization, gradient accumulation, and FP16 training for efficiency and robustness.
Provides utilities for extracting text embeddings and converting TensorFlow BERT models to PyTorch.

Maintenance & Community

The project is associated with Microsoft researchers. Contact information for several key contributors is provided. No explicit community channels like Discord/Slack are mentioned.

Licensing & Compatibility

The README does not explicitly state a license. It references other projects with MIT and Apache 2.0 licenses, but this does not guarantee compatibility. Commercial use would require clarification.

Limitations & Caveats

The project relies on Python 3.6, which is end-of-life. Public model sharing is currently unavailable due to policy changes. Some results may be based on older GLUE datasets, and achieving top leaderboard performance may require task-specific fine-tuning beyond the multi-task refinement.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days