LLaDA2.X  by inclusionAI

Diffusion language models for advanced text generation

Created 2 months ago
349 stars

Top 80.0% on SourcePulse

GitHubView on GitHub
Project Summary

LLaDA2.X is a series of large discrete diffusion language models (dLLMs) from InclusionAI, scaling up to 100 billion parameters. It addresses the challenge of achieving state-of-the-art performance and efficient inference in diffusion-based language models, offering a fully open-source alternative to traditional autoregressive models. The series targets researchers and engineers seeking powerful, scalable LLMs for tasks like code generation and instruction following, providing significant inference speedups through novel techniques.

How It Works

LLaDA2.X models are discrete diffusion language models (dLLMs) leveraging a Mixture-of-Experts (MoE) architecture, enabling scaling to 100 billion parameters. The project utilizes the dInfer framework for accelerated inference via parallel decoding, KV-Cache reuse, and block-level parallel decoding. LLaDA2.1 further enhances speed and quality with "Token-to-Token (T2T) editing" combined with "Mask-to-Token (M2T)" schemes, offering configurable "Speedy" and "Quality" modes. The dFactory project supports fine-tuning these models.

Quick Start & Requirements

Model weights and training code are available on Hugging Face. Installation of dInfer involves cloning the repo and using pip, with optional vLLM or SGLang backends. dFactory requires environment setup via uv or pip. Running the 100B variants necessitates significant GPU resources. Technical details and benchmarks are in associated arXiv papers.

Highlighted Details

  • LLaDA2.0-flash scales to 100 billion parameters, a milestone for diffusion language models.
  • LLaDA2.0-flash-CAP achieves up to 535 tokens/s inference speed via Confidence-Aware Parallel decoding [README].
  • LLaDA2.1 models offer advanced inference, with benchmarks showing up to 892 TPS on HumanEval+ for coding tasks.
  • The LLaDA2.X series, including model weights and training code, is fully open-sourced [README].

Maintenance & Community

dInfer and dFactory show active development with recent commits and releases. Contributors like Da Zheng and Lun Du (dInfer), and VeOmni, edwardzjl (dFactory) are noted. Explicit community channels (Discord/Slack) or a public roadmap are not readily apparent from the browsed content.

Licensing & Compatibility

The LLaDA2.X project and its tools (dInfer, dFactory) are licensed under the Apache License 2.0 [README, 1, 2], which is permissive for commercial use.

Limitations & Caveats

The documentation focuses on capabilities and achievements, detailing no explicit limitations or known bugs. However, the 100B parameter models imply substantial hardware requirements for training and inference, potentially posing a barrier for users with limited computational resources.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
115 stars in the last 30 days

Explore Similar Projects

Starred by Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

Consistency_LLM by hao-ai-lab

0%
412
Parallel decoder for efficient LLM inference
Created 2 years ago
Updated 1 year ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0%
489
MoE model for research
Created 9 months ago
Updated 6 months ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

prompt-lookup-decoding by apoorvumang

0.3%
596
Decoding method for faster LLM generation
Created 2 years ago
Updated 1 year ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0.2%
862
Pretraining code for depth-recurrent language model research
Created 1 year ago
Updated 1 month ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

0.4%
2k
Speculative decoding research paper for faster LLM inference
Created 2 years ago
Updated 5 days ago
Starred by Didier Lopes Didier Lopes(Founder of OpenBB), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.