LLM research paper exploring masked diffusion language models
Top 18.2% on sourcepulse
LLaDA is an 8B parameter PyTorch-based diffusion model for natural language processing, designed to rival LLaMA3 8B performance. It targets researchers and developers exploring alternative LLM architectures beyond autoregressive models, offering a theoretically grounded approach to generative language modeling with capabilities like in-context learning and instruction following.
How It Works
LLaDA employs a masked diffusion model approach, differing from autoregressive models like GPT. It uses a Transformer architecture but models language probabilistically via diffusion, with a randomly varying masking ratio during training. This approach is theoretically shown to be an upper bound on the negative log-likelihood, enabling generative capabilities and Fisher consistency for scalability.
Quick Start & Requirements
pip install transformers==4.38.2 gradio
transformers
library to load GSAI-ML/LLaDA-8B-Base
or GSAI-ML/LLaDA-8B-Instruct
.python app.py
after installing Gradio.transformers
, gradio
.torch_dtype=torch.bfloat16
.Highlighted Details
chat.py
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Sampling speed is currently slower than autoregressive models due to fixed context length, lack of KV-Cache, and optimal performance requiring sampling steps equal to response length. The project aims to optimize efficiency in future work.
4 days ago
1 week