Discover and explore top open-source AI tools and projects—updated daily.
SJTU-DENG-LabEnabling dLLMs for faster-than-AR inference
Top 99.3% on SourcePulse
Summary
This project introduces Discrete Diffusion Forcing (D2F), a novel training and inference paradigm designed to overcome the speed limitations of Discrete Diffusion Language Models (dLLMs). D2F enables dLLMs to achieve faster-than-autoregressive (AR) inference speeds for the first time, offering significant throughput advantages for researchers and engineers working with large language models who require high-performance generation.
How It Works
D2F employs a hybrid architecture featuring block-wise causal attention, allowing bidirectional attention within blocks and causal attention between them. This design ensures compatibility with standard KV caching, drastically reducing redundant computations. The model is efficiently trained via asymmetric distillation, where a student dLLM learns to mimic a powerful teacher dLLM using only a limited, causal context. Inference is accelerated through high-throughput pipelined decoding, enabling parallel refinement of multiple blocks and maximizing GPU utilization.
Quick Start & Requirements
uv sync or Conda (conda create -n d2f python=3.10; conda activate d2f), and install dependencies (pip install -r requirements.txt).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The vLLM integration is a preliminary proof-of-concept, exhibiting a score drop that is actively being addressed. Further optimization, including specialized CUDA kernels and distributed inference, is planned. The project's license is not explicitly stated, which may impact adoption decisions.
3 months ago
Inactive
SafeAILab