mdlm by kuleshov-group

Research paper for masked diffusion language model (MDLM)

Created 1 year ago

604 stars

Top 54.2% on SourcePulse

View on GitHub

2 Experts Love This Project

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This repository introduces MDLM, a Masked Diffusion Language Model that achieves state-of-the-art perplexity on large text datasets by simplifying diffusion loss to a mixture of masked language modeling losses. It is designed for researchers and practitioners in natural language processing and generative modeling.

How It Works

MDLM utilizes a novel (SUBS)titution-based parameterization for discrete diffusion models. This approach reformulates the diffusion process, allowing the absorbing state diffusion loss to be expressed as a combination of classical masked language modeling losses. This simplification leads to more efficient training and inference compared to prior diffusion language models.

Quick Start & Requirements

Install dependencies: conda env create -f requirements.yaml and conda activate mdlm.
Create directories: mkdir outputs and mkdir watch_folder.
Training command example: sbatch scripts/train_owt_mdlm.sh.
Pre-trained model available on Huggingface: kuleshov-group/mdlm-owt.
Full documentation and demo notebooks are linked in the README.

Highlighted Details

Achieves SOTA perplexity on LM1B and OpenWebText among diffusion models.
Offers competitive zero-shot perplexity with SOTA autoregressive models.
Includes an efficient sampler (ddpm_cache) that is ~3-4x faster than existing diffusion model samplers.
Supports semi-autoregressive (SAR) generation, enabling 25-30x faster decoding than SSD-LM.
Provides baseline implementations for autoregressive models and SEDD.

Maintenance & Community

The project is associated with the Kuleshov Group at Stanford University. The README mentions an improved implementation is available in the DUO Github repo.

Licensing & Compatibility

The repository does not explicitly state a license. However, the project is presented as a research artifact from NeurIPS 2024, implying a focus on academic use. Commercial use would require clarification.

Limitations & Caveats

The project is presented as a NeurIPS 2024 submission, suggesting it may be research-oriented and potentially subject to ongoing development or refinement. An improved implementation is noted as available elsewhere.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

20 stars in the last 30 days