Research paper for masked diffusion language model (MDLM)
Top 66.7% on sourcepulse
This repository introduces MDLM, a Masked Diffusion Language Model that achieves state-of-the-art perplexity on large text datasets by simplifying diffusion loss to a mixture of masked language modeling losses. It is designed for researchers and practitioners in natural language processing and generative modeling.
How It Works
MDLM utilizes a novel (SUBS)titution-based parameterization for discrete diffusion models. This approach reformulates the diffusion process, allowing the absorbing state diffusion loss to be expressed as a combination of classical masked language modeling losses. This simplification leads to more efficient training and inference compared to prior diffusion language models.
Quick Start & Requirements
conda env create -f requirements.yaml
and conda activate mdlm
.mkdir outputs
and mkdir watch_folder
.sbatch scripts/train_owt_mdlm.sh
.kuleshov-group/mdlm-owt
.Highlighted Details
ddpm_cache
) that is ~3-4x faster than existing diffusion model samplers.Maintenance & Community
The project is associated with the Kuleshov Group at Stanford University. The README mentions an improved implementation is available in the DUO Github repo.
Licensing & Compatibility
The repository does not explicitly state a license. However, the project is presented as a research artifact from NeurIPS 2024, implying a focus on academic use. Commercial use would require clarification.
Limitations & Caveats
The project is presented as a NeurIPS 2024 submission, suggesting it may be research-oriented and potentially subject to ongoing development or refinement. An improved implementation is noted as available elsewhere.
2 months ago
1+ week