mtla  by D-Keqi

Efficient attention for LLMs and speech processing

created 3 months ago
663 stars

Top 50.6% on SourcePulse

GitHubView on GitHub
Project Summary

MTLA introduces a novel Multi-head Temporal Latent Attention mechanism designed to enhance efficiency in decoder-only architectures like LLMs. It achieves this by temporally compressing the key-value cache, significantly reducing memory footprint during inference, making it suitable for researchers and engineers working on large-scale speech and language processing tasks.

How It Works

MTLA builds upon DeepSeek MLA, incorporating temporal compression of the key-value cache. This core innovation allows for more efficient self-attention computations and a reduced memory overhead, particularly beneficial for autoregressive models. The library supports various attention mechanisms (MHA, MQA, GQA, MLA, MTLA) and positional encodings (RoPE, Decoupled RoPE).

Quick Start & Requirements

Highlighted Details

  • Supports multiple attention mechanisms: MHA, MQA, GQA, MLA, and MTLA.
  • Includes setup recipes for speech translation (MuST-C), speech recognition (AMI), spoken language understanding (SLURP), and text summarisation (XSum).
  • Offers Fairseq-style parallel beam search for evaluation and quality metrics like BLEU, WER, and ROUGE.
  • Provides efficiency evaluation for inference time and GPU memory usage.

Maintenance & Community

The project is maintained by D-Keqi and Philip C. Woodland. Further community or roadmap information is not detailed in the README.

Licensing & Compatibility

The project does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify any limitations or known caveats. The project appears to be research-oriented with a recent arXiv publication date.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
95 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.