speculative-decoding by lucidrains

Speculative decoding explorations

Created 2 years ago

296 stars

Top 89.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Vincent Weisser

Cofounder of Prime Intellect

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Alexander Borzunov

Research Scientist at OpenAI

Project Summary

This repository explores speculative decoding techniques to accelerate text-to-semantic decoders, particularly for applications like Spear-TTS. It targets researchers and engineers seeking to improve inference speed for large language models.

How It Works

The project implements and experiments with various speculative decoding strategies, including early exit schemes and a "prophet transformer" approach. These methods aim to speed up generation by using a smaller, faster "draft" model to predict token sequences, which are then verified by a larger, more accurate model, reducing the number of forward passes required.

Quick Start & Requirements

Installation: pip install ... (specific command not provided in README)
Dependencies: PyTorch, CUDA (implied for performance)
Resources: Requires significant computational resources for training and experimentation.
Links: No direct quick-start or demo links provided.

Highlighted Details

Explores early exit schemes and a novel "prophet transformer" for speculative decoding.
Investigates batched speculative decoding for improved efficiency.
Aims to optimize performance and reduce indexing overhead in batched decoding.
Benchmarking and comparison charts are planned.

Maintenance & Community

Sponsored by StabilityAI and Huggingface.
Author is lucidrains, known for open-sourcing AI techniques.
No explicit community links (Discord, Slack) are mentioned.

Licensing & Compatibility

License: Not explicitly stated in the README.
Compatibility: Assumed to be compatible with PyTorch-based ecosystems.

Limitations & Caveats

The project is described as "explorations," and some functionalities like batched speculative decoding are noted as requiring significant work to become usable. Performance optimization is an ongoing effort.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days