LLaDA2.X by inclusionAI

Diffusion language models for advanced text generation

Created 7 months ago

441 stars

Top 67.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jesse Clark

Cofounder of Marqo

Project Summary

LLaDA2.X is a series of large discrete diffusion language models (dLLMs) from InclusionAI, scaling up to 100 billion parameters. It addresses the challenge of achieving state-of-the-art performance and efficient inference in diffusion-based language models, offering a fully open-source alternative to traditional autoregressive models. The series targets researchers and engineers seeking powerful, scalable LLMs for tasks like code generation and instruction following, providing significant inference speedups through novel techniques.

How It Works

LLaDA2.X models are discrete diffusion language models (dLLMs) leveraging a Mixture-of-Experts (MoE) architecture, enabling scaling to 100 billion parameters. The project utilizes the dInfer framework for accelerated inference via parallel decoding, KV-Cache reuse, and block-level parallel decoding. LLaDA2.1 further enhances speed and quality with "Token-to-Token (T2T) editing" combined with "Mask-to-Token (M2T)" schemes, offering configurable "Speedy" and "Quality" modes. The dFactory project supports fine-tuning these models.

Quick Start & Requirements

Model weights and training code are available on Hugging Face. Installation of dInfer involves cloning the repo and using pip, with optional vLLM or SGLang backends. dFactory requires environment setup via uv or pip. Running the 100B variants necessitates significant GPU resources. Technical details and benchmarks are in associated arXiv papers.

Highlighted Details

LLaDA2.0-flash scales to 100 billion parameters, a milestone for diffusion language models.
LLaDA2.0-flash-CAP achieves up to 535 tokens/s inference speed via Confidence-Aware Parallel decoding [README].
LLaDA2.1 models offer advanced inference, with benchmarks showing up to 892 TPS on HumanEval+ for coding tasks.
The LLaDA2.X series, including model weights and training code, is fully open-sourced [README].

Maintenance & Community

dInfer and dFactory show active development with recent commits and releases. Contributors like Da Zheng and Lun Du (dInfer), and VeOmni, edwardzjl (dFactory) are noted. Explicit community channels (Discord/Slack) or a public roadmap are not readily apparent from the browsed content.

Licensing & Compatibility

The LLaDA2.X project and its tools (dInfer, dFactory) are licensed under the Apache License 2.0 [README, 1, 2], which is permissive for commercial use.

Limitations & Caveats

The documentation focuses on capabilities and achievements, detailing no explicit limitations or known bugs. However, the 100B parameter models imply substantial hardware requirements for training and inference, potentially posing a barrier for users with limited computational resources.

LLaDA2.X by inclusionAI

Explore Similar Projects

Mengzi3 by Langboat

SDAR by JetAstra

Consistency_LLM by hao-ai-lab

dots.llm1 by rednote-hilab

prompt-lookup-decoding by apoorvumang

Hy-MT2 by Tencent-Hunyuan

xinfer by guoqingbao

llguidance by guidance-ai

recurrent-pretraining by seal-rg

EAGLE by SafeAILab

Awesome-LLM-Inference by xlite-dev

DeepSeek-Coder-V2 by deepseek-ai