dllm by ZHZisZZ

Framework for diffusion language modeling

Created 3 months ago

1,564 stars

Top 26.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Project Summary

dLLM is a library designed to unify the training and evaluation of diffusion language models, aiming to enhance transparency and reproducibility across the development pipeline. It targets researchers and engineers working with advanced language models, offering scalable training and streamlined evaluation to simplify the development and deployment of models like LLaDA and Dream, and enabling novel applications such as instruction-tuned BERT chatbots and edit-aware language generation.

How It Works

The library provides scalable training pipelines, drawing inspiration from transformers.Trainer, with robust support for distributed training frameworks like LoRA, DeepSpeed, and FSDP. Its core innovation lies in unified evaluation pipelines, modeled after lm-evaluation-harness, which abstract complex inference details for easier customization and benchmarking. This integrated approach facilitates minimal pretraining, finetuning, and evaluation recipes for open-weight models and implements advanced training algorithms like Edit Flows, enabling researchers to experiment with generative model extensions.

Quick Start & Requirements

Installation involves creating a Python 3.10 Conda environment, installing PyTorch with CUDA 12.4 (other versions may be compatible), and then installing the dLLM package in editable mode (pip install -e .). Optional evaluation setup requires initializing the lm-evaluation-harness submodule and installing its dependencies. Slurm users need to configure scripts/train.slurm.sh for their specific cluster environment.

Highlighted Details

Provides example recipes for pretraining, finetuning, and evaluating LLaDA and Dream models.
Includes implementations for BERT Chat (instruction-tuned BERT) and EditFlow models, demonstrating edit operations like insertion, deletion, and substitution during generation.
Supports advanced training techniques including LoRA and 4-bit quantization for efficient finetuning.
Features unified generators for abstracting inference details and interactive chat scripts for multi-turn dialogue.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps.

Licensing & Compatibility

The README does not specify a software license. This absence requires clarification for any potential adoption, especially concerning commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not explicitly state any limitations, alpha status, known bugs, or unsupported platforms. The EditFlow examples are described as an "educational reference," suggesting a focus on learning and experimentation rather than immediate production deployment.

Health Check

Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

211 stars in the last 30 days