mdlm  by kuleshov-group

Research paper for masked diffusion language model (MDLM)

created 1 year ago
461 stars

Top 66.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository introduces MDLM, a Masked Diffusion Language Model that achieves state-of-the-art perplexity on large text datasets by simplifying diffusion loss to a mixture of masked language modeling losses. It is designed for researchers and practitioners in natural language processing and generative modeling.

How It Works

MDLM utilizes a novel (SUBS)titution-based parameterization for discrete diffusion models. This approach reformulates the diffusion process, allowing the absorbing state diffusion loss to be expressed as a combination of classical masked language modeling losses. This simplification leads to more efficient training and inference compared to prior diffusion language models.

Quick Start & Requirements

  • Install dependencies: conda env create -f requirements.yaml and conda activate mdlm.
  • Create directories: mkdir outputs and mkdir watch_folder.
  • Training command example: sbatch scripts/train_owt_mdlm.sh.
  • Pre-trained model available on Huggingface: kuleshov-group/mdlm-owt.
  • Full documentation and demo notebooks are linked in the README.

Highlighted Details

  • Achieves SOTA perplexity on LM1B and OpenWebText among diffusion models.
  • Offers competitive zero-shot perplexity with SOTA autoregressive models.
  • Includes an efficient sampler (ddpm_cache) that is ~3-4x faster than existing diffusion model samplers.
  • Supports semi-autoregressive (SAR) generation, enabling 25-30x faster decoding than SSD-LM.
  • Provides baseline implementations for autoregressive models and SEDD.

Maintenance & Community

The project is associated with the Kuleshov Group at Stanford University. The README mentions an improved implementation is available in the DUO Github repo.

Licensing & Compatibility

The repository does not explicitly state a license. However, the project is presented as a research artifact from NeurIPS 2024, implying a focus on academic use. Commercial use would require clarification.

Limitations & Caveats

The project is presented as a NeurIPS 2024 submission, suggesting it may be research-oriented and potentially subject to ongoing development or refinement. An improved implementation is noted as available elsewhere.

Health Check
Last commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
1
Star History
92 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.