Consistency_LLM  by hao-ai-lab

Parallel decoder for efficient LLM inference

created 1 year ago
397 stars

Top 73.8% on sourcepulse

GitHubView on GitHub
Project Summary

Consistency Large Language Models (CLLMs) offer a novel approach to accelerate LLM inference by enabling parallel decoding of multiple tokens simultaneously. This method, Jacobi decoding, significantly reduces latency by mapping an $n$-token sequence to the correct output in fewer steps than traditional auto-regressive decoding, benefiting researchers and developers seeking faster LLM deployments.

How It Works

CLLMs are trained with a specific objective: to transform any randomly initialized $n$-token sequence into the same output as auto-regressive decoding with minimal steps. This is achieved through Jacobi decoding, an efficient parallel decoding strategy that avoids the need for draft models or architectural modifications, simplifying integration and maintenance.

Quick Start & Requirements

  • Install: Clone the repository, activate a Python 3.10 conda environment, and run pip install -r requirements.txt followed by pip install flash-attn==2.4.1.
  • Prerequisites: Python 3.10, CUDA 11.8+ (for flash-attn), and specific datasets may need to be downloaded or installed.
  • Resources: Training requires significant computational resources; inference speedups are demonstrated up to $3.4\times$.
  • Links: Paper, Blog, FastChat Integration.

Highlighted Details

  • Achieves $2.4\times$ to $3.4\times$ generation speed improvements across various tasks.
  • Seamless integration with existing LLMs without architectural changes.
  • Compatible with other inference techniques like Lookahead Decoding for further speedups.
  • Model checkpoints available on Huggingface Hub for 7B models fine-tuned on ShareGPT, GSM8K, Spider, and Code-Search-Net datasets.

Maintenance & Community

  • CLLMs have been integrated into FastChat.
  • Model checkpoints and paper are publicly available.
  • Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • The repository does not explicitly state a license. Model weights are available on Huggingface, typically under their respective base model licenses.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license for the code or model weights, which may impact commercial adoption. Training CLLMs requires collecting or generating Jacobi trajectories, adding a data preparation step.

Health Check
Last commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
1 more.

LookaheadDecoding by hao-ai-lab

0.1%
1k
Parallel decoding algorithm for faster LLM inference
created 1 year ago
updated 4 months ago
Feedback? Help us improve.