DiffuLLaMA  by HKUNLP

Adapt autoregressive models to diffusion language models

Created 11 months ago
305 stars

Top 87.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and methods for adapting existing autoregressive (AR) language models, like GPT-2 and LLaMA, into diffusion language models (DLMs). It addresses the challenge of training large DLMs from scratch by leveraging AR models, enabling fair comparisons on benchmarks and offering efficient adaptation techniques. The target audience includes researchers and practitioners interested in exploring and utilizing diffusion-based language generation.

How It Works

The project adapts AR models to DLMs through a continual pre-training approach, demonstrating a connection between AR and diffusion modeling objectives. This method allows for efficient conversion of models ranging from 127M to 7B parameters using a limited amount of training data (under 200B tokens). The adaptation process is integrated into a custom LLaMA-Factory framework, with specific modifications for handling attention masks.

Quick Start & Requirements

  • Installation: pip install -r LLaMA-Factory/requirements.txt and pip install flash-attn==2.6.3 --no-build-isolation.
  • Prerequisites: Python 3.11, PyTorch 2.1.1+cu121, Transformers 4.44.2. Flash-attention is recommended for performance.
  • Setup: Requires setting HuggingFace cache directory (HF_HOME) and potentially WandB API key.
  • Links: LLaMA-Factory (base for adaptation code), OpenReview (camera-ready copy).

Highlighted Details

  • Adapts models from 127M to 7B parameters (GPT-2, LLaMA).
  • Uses a continual pre-training approach with <200B tokens.
  • Offers efficient fine-tuning scripts (including LoRA).
  • Provides evaluation toolkits and baseline implementations (Plaid, SEDD).

Maintenance & Community

The project was accepted to ICLR 2025. Updates include bug fixes and new model releases (e.g., Dream-7B). The core adaptation code is based on LLaMA-Factory.

Licensing & Compatibility

Licensed under Apache License 2.0. Requires preservation of copyright and license notices. Compatible with commercial use, provided license terms are met.

Limitations & Caveats

The repository contains separate codebases for DiffuLLaMA (using DiffuLLaMA-training/) and DiffuGPT (using LLaMA-Factory/), which might require careful navigation. While Flash-attention is recommended for efficiency, it's not strictly required for basic functionality.

Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
30 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
26 more.

axolotl by axolotl-ai-cloud

0.5%
10k
CLI tool for streamlined post-training of AI models
Created 2 years ago
Updated 18 hours ago
Feedback? Help us improve.