Discover and explore top open-source AI tools and projects—updated daily.
Adapt autoregressive models to diffusion language models
Top 87.8% on SourcePulse
This repository provides code and methods for adapting existing autoregressive (AR) language models, like GPT-2 and LLaMA, into diffusion language models (DLMs). It addresses the challenge of training large DLMs from scratch by leveraging AR models, enabling fair comparisons on benchmarks and offering efficient adaptation techniques. The target audience includes researchers and practitioners interested in exploring and utilizing diffusion-based language generation.
How It Works
The project adapts AR models to DLMs through a continual pre-training approach, demonstrating a connection between AR and diffusion modeling objectives. This method allows for efficient conversion of models ranging from 127M to 7B parameters using a limited amount of training data (under 200B tokens). The adaptation process is integrated into a custom LLaMA-Factory framework, with specific modifications for handling attention masks.
Quick Start & Requirements
pip install -r LLaMA-Factory/requirements.txt
and pip install flash-attn==2.6.3 --no-build-isolation
.HF_HOME
) and potentially WandB API key.Highlighted Details
Maintenance & Community
The project was accepted to ICLR 2025. Updates include bug fixes and new model releases (e.g., Dream-7B). The core adaptation code is based on LLaMA-Factory.
Licensing & Compatibility
Licensed under Apache License 2.0. Requires preservation of copyright and license notices. Compatible with commercial use, provided license terms are met.
Limitations & Caveats
The repository contains separate codebases for DiffuLLaMA (using DiffuLLaMA-training/
) and DiffuGPT (using LLaMA-Factory/
), which might require careful navigation. While Flash-attention is recommended for efficiency, it's not strictly required for basic functionality.
3 months ago
1 week