speculative-decoding  by lucidrains

Speculative decoding explorations

created 1 year ago
275 stars

Top 94.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository explores speculative decoding techniques to accelerate text-to-semantic decoders, particularly for applications like Spear-TTS. It targets researchers and engineers seeking to improve inference speed for large language models.

How It Works

The project implements and experiments with various speculative decoding strategies, including early exit schemes and a "prophet transformer" approach. These methods aim to speed up generation by using a smaller, faster "draft" model to predict token sequences, which are then verified by a larger, more accurate model, reducing the number of forward passes required.

Quick Start & Requirements

  • Installation: pip install ... (specific command not provided in README)
  • Dependencies: PyTorch, CUDA (implied for performance)
  • Resources: Requires significant computational resources for training and experimentation.
  • Links: No direct quick-start or demo links provided.

Highlighted Details

  • Explores early exit schemes and a novel "prophet transformer" for speculative decoding.
  • Investigates batched speculative decoding for improved efficiency.
  • Aims to optimize performance and reduce indexing overhead in batched decoding.
  • Benchmarking and comparison charts are planned.

Maintenance & Community

  • Sponsored by StabilityAI and Huggingface.
  • Author is lucidrains, known for open-sourcing AI techniques.
  • No explicit community links (Discord, Slack) are mentioned.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Assumed to be compatible with PyTorch-based ecosystems.

Limitations & Caveats

The project is described as "explorations," and some functionalities like batched speculative decoding are noted as requiring significant work to become usable. Performance optimization is an ongoing effort.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
1 more.

prompt-lookup-decoding by apoorvumang

1.1%
556
Decoding method for faster LLM generation
created 1 year ago
updated 11 months ago
Feedback? Help us improve.