DiffuLLaMA by HKUNLP

Adapt autoregressive models to diffusion language models

Created 1 year ago

359 stars

Top 78.1% on SourcePulse

Project Summary

This repository provides code and methods for adapting existing autoregressive (AR) language models, like GPT-2 and LLaMA, into diffusion language models (DLMs). It addresses the challenge of training large DLMs from scratch by leveraging AR models, enabling fair comparisons on benchmarks and offering efficient adaptation techniques. The target audience includes researchers and practitioners interested in exploring and utilizing diffusion-based language generation.

How It Works

The project adapts AR models to DLMs through a continual pre-training approach, demonstrating a connection between AR and diffusion modeling objectives. This method allows for efficient conversion of models ranging from 127M to 7B parameters using a limited amount of training data (under 200B tokens). The adaptation process is integrated into a custom LLaMA-Factory framework, with specific modifications for handling attention masks.

Quick Start & Requirements

Installation: pip install -r LLaMA-Factory/requirements.txt and pip install flash-attn==2.6.3 --no-build-isolation.
Prerequisites: Python 3.11, PyTorch 2.1.1+cu121, Transformers 4.44.2. Flash-attention is recommended for performance.
Setup: Requires setting HuggingFace cache directory (HF_HOME) and potentially WandB API key.
Links: LLaMA-Factory (base for adaptation code), OpenReview (camera-ready copy).

Highlighted Details

Adapts models from 127M to 7B parameters (GPT-2, LLaMA).
Uses a continual pre-training approach with <200B tokens.
Offers efficient fine-tuning scripts (including LoRA).
Provides evaluation toolkits and baseline implementations (Plaid, SEDD).

Maintenance & Community

The project was accepted to ICLR 2025. Updates include bug fixes and new model releases (e.g., Dream-7B). The core adaptation code is based on LLaMA-Factory.

Licensing & Compatibility

Licensed under Apache License 2.0. Requires preservation of copyright and license notices. Compatible with commercial use, provided license terms are met.

Limitations & Caveats

The repository contains separate codebases for DiffuLLaMA (using DiffuLLaMA-training/) and DiffuGPT (using LLaMA-Factory/), which might require careful navigation. While Flash-attention is recommended for efficiency, it's not strictly required for basic functionality.

DiffuLLaMA by HKUNLP

Explore Similar Projects

xlora by EricLBuehler

tiny-grpo by open-thought

megalodon by XuezheMax

Show-o by showlab

LLMBox by RUCAIBox

finetrainers by huggingface

Qwen-VL-Series-Finetune by 2U1

OneTrainer by Nerogar

open_flamingo by mlfoundations

kohya-trainer by Linaqruf

EasyR1 by hiyouga

axolotl by axolotl-ai-cloud