mdlm  by kuleshov-group

Research paper for masked diffusion language model (MDLM)

Created 1 year ago
502 stars

Top 62.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository introduces MDLM, a Masked Diffusion Language Model that achieves state-of-the-art perplexity on large text datasets by simplifying diffusion loss to a mixture of masked language modeling losses. It is designed for researchers and practitioners in natural language processing and generative modeling.

How It Works

MDLM utilizes a novel (SUBS)titution-based parameterization for discrete diffusion models. This approach reformulates the diffusion process, allowing the absorbing state diffusion loss to be expressed as a combination of classical masked language modeling losses. This simplification leads to more efficient training and inference compared to prior diffusion language models.

Quick Start & Requirements

  • Install dependencies: conda env create -f requirements.yaml and conda activate mdlm.
  • Create directories: mkdir outputs and mkdir watch_folder.
  • Training command example: sbatch scripts/train_owt_mdlm.sh.
  • Pre-trained model available on Huggingface: kuleshov-group/mdlm-owt.
  • Full documentation and demo notebooks are linked in the README.

Highlighted Details

  • Achieves SOTA perplexity on LM1B and OpenWebText among diffusion models.
  • Offers competitive zero-shot perplexity with SOTA autoregressive models.
  • Includes an efficient sampler (ddpm_cache) that is ~3-4x faster than existing diffusion model samplers.
  • Supports semi-autoregressive (SAR) generation, enabling 25-30x faster decoding than SSD-LM.
  • Provides baseline implementations for autoregressive models and SEDD.

Maintenance & Community

The project is associated with the Kuleshov Group at Stanford University. The README mentions an improved implementation is available in the DUO Github repo.

Licensing & Compatibility

The repository does not explicitly state a license. However, the project is presented as a research artifact from NeurIPS 2024, implying a focus on academic use. Commercial use would require clarification.

Limitations & Caveats

The project is presented as a NeurIPS 2024 submission, suggesting it may be research-oriented and potentially subject to ongoing development or refinement. An improved implementation is noted as available elsewhere.

Health Check
Last Commit

3 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
1
Star History
22 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Wing Lian Wing Lian(Founder of Axolotl AI).

LLaDA by ML-GSAI

1.7%
3k
LLM research paper exploring masked diffusion language models
Created 7 months ago
Updated 1 day ago
Feedback? Help us improve.