SDAR  by JetAstra

Scalable sequence generation via diffusion and autoregression synergy

Created 2 months ago
261 stars

Top 97.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

SDAR (Synergy of Diffusion and AutoRegression) is a large-scale language model family that merges autoregressive (AR) and discrete diffusion modeling. It targets researchers and practitioners seeking efficient, high-performance LLMs, offering competitive accuracy with significantly faster inference speeds (2-4x) and strong reasoning capabilities.

How It Works

SDAR combines the training efficiency of AR methods with the parallel decoding of diffusion models. This synergistic approach allows for training scalability while enabling highly parallelized, faster generation. The core innovation lies in this hybrid paradigm, positioning SDAR as a powerful diffusion-based language model that rivals state-of-the-art AR models, particularly excelling in generalist and specialist roles.

Quick Start & Requirements

Installation involves cloning the repository, initializing submodules, and installing dependencies like transformers>=4.52.4 and flash-attn. GPU acceleration is essential. The project offers multiple inference engines: a built-in script, the optimized JetEngine (achieving 1800+ tokens/sec on A800, 3700+ on H100), and integration with lmdeploy. Fine-tuning is supported via a framework powered by LlamaFactory.

Highlighted Details

  • Offers models in sizes from 1.7B to 30B parameters (dense and MoE).
  • Achieves 2-4x faster inference compared to traditional AR models, with speedup scaling with model size.
  • Demonstrates advanced performance on science reasoning benchmarks like GPQA and ChemBench.
  • Provides industrial-grade inference solutions (JetEngine, lmdeploy) for production deployment.
  • Supports downstream task fine-tuning using LlamaFactory.

Maintenance & Community

The project is actively developed, with core contributors including Shuang Cheng, Yihan Bian, Dawei Liu, and Biqing Qi. While specific community channels like Discord/Slack are not detailed in the README snippet, contact information for key researchers is provided for inquiries. The roadmap indicates ongoing feature development.

Licensing & Compatibility

SDAR is released under the MIT license, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is explicitly described as being in an "early experimental state." The developers are actively working on further systematic development and welcome collaborations.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
11
Star History
31 stars in the last 30 days

Explore Similar Projects

Starred by Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

Consistency_LLM by hao-ai-lab

0%
405
Parallel decoder for efficient LLM inference
Created 1 year ago
Updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Ying Sheng Ying Sheng(Coauthor of SGLang), and
2 more.

LookaheadDecoding by hao-ai-lab

0.2%
1k
Parallel decoding algorithm for faster LLM inference
Created 1 year ago
Updated 8 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

0.5%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 3 weeks ago
Feedback? Help us improve.