SMDM  by ML-GSAI

PyTorch code for masked diffusion model research paper

Created 11 months ago
296 stars

Top 89.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for "Scaling up Masked Diffusion Models on Text," a research paper exploring the scalability and effectiveness of Masked Diffusion Models (MDMs) in language tasks. It targets researchers and practitioners interested in advancing text generation and understanding beyond traditional autoregressive models, offering competitive performance and unique advantages in bidirectional reasoning and temporal adaptation.

How It Works

The project implements Masked Diffusion Models (MDMs) for text, a probabilistic approach that demonstrates scaling laws comparable to autoregressive models (ARMs) with a smaller compute gap. It introduces unsupervised classifier-free guidance leveraging unpaired data for conditional inference. The architecture is designed to handle bidirectional reasoning and temporal shifts, addressing limitations found in ARMs.

Quick Start & Requirements

  • Installation: Requires an Anaconda environment, potentially based on TinyLlama. Install with pip install lm-eval==0.4.4 numpy==1.25.0 bitsandbytes==0.43.1 openai==0.28 fschat==0.2.34 anthropic. Conda installation commands are available in CONDA.md.
  • Prerequisites: PyTorch, CUDA, Python. Specific dataset preprocessing (SlimPajama, ShareGPT, GSM8K, FineWeb) is required.
  • Resources: Training commands indicate multi-GPU (8+) and multi-node setups are supported for large models (up to 1.1B parameters).
  • Links: Pretrained models are available on Huggingface.

Highlighted Details

  • A 1.1B MDM outperforms TinyLlama on zero-shot benchmarks and matches Llama-2 7B on GSM8K.
  • MDMs offer a 1.4x speedup over ARMs at comparable performance or higher quality at increased cost.
  • MDMs successfully address the "reverse curse" problem, outperforming much larger ARMs.
  • The project provides implementations for training ARMs and MDMs, fine-tuning for specific tasks (math reasoning, conditional generation), and evaluation across various benchmarks.

Maintenance & Community

The project is associated with the ICLR2025 paper "Scaling up Masked Diffusion Models on Text." Links to specific model checkpoints and evaluation scripts are provided.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The setup requires significant data preprocessing and potentially complex environment management (e.g., separate Anaconda environment for FineWeb dataset preprocessing). Specific version requirements for some dependencies might exist.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
Created 8 months ago
Updated 2 months ago
Feedback? Help us improve.