mtla by D-Keqi

Efficient attention for LLMs and speech processing

Created 6 months ago

759 stars

Top 45.9% on SourcePulse

Project Summary

MTLA introduces a novel Multi-head Temporal Latent Attention mechanism designed to enhance efficiency in decoder-only architectures like LLMs. It achieves this by temporally compressing the key-value cache, significantly reducing memory footprint during inference, making it suitable for researchers and engineers working on large-scale speech and language processing tasks.

How It Works

MTLA builds upon DeepSeek MLA, incorporating temporal compression of the key-value cache. This core innovation allows for more efficient self-attention computations and a reduced memory overhead, particularly beneficial for autoregressive models. The library supports various attention mechanisms (MHA, MQA, GQA, MLA, MTLA) and positional encodings (RoPE, Decoupled RoPE).

Quick Start & Requirements

Install the MTLA module: pip install mtla
For full experiments, clone the repo and install fairseq: cd experiments/tools/fairseq && pip install --editable ./
Prerequisites: PyTorch >= 1.10.0, Python >= 3.8.
A Colab notebook demo is available for training and inference: https://colab.research.google.com/github/D-Keqi/mtla/blob/main/assets/MTLA.ipynb

Highlighted Details

Supports multiple attention mechanisms: MHA, MQA, GQA, MLA, and MTLA.
Includes setup recipes for speech translation (MuST-C), speech recognition (AMI), spoken language understanding (SLURP), and text summarisation (XSum).
Offers Fairseq-style parallel beam search for evaluation and quality metrics like BLEU, WER, and ROUGE.
Provides efficiency evaluation for inference time and GPU memory usage.

Maintenance & Community

The project is maintained by D-Keqi and Philip C. Woodland. Further community or roadmap information is not detailed in the README.

Licensing & Compatibility

The project does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify any limitations or known caveats. The project appears to be research-oriented with a recent arXiv publication date.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

Toolkit-for-Prompt-Compression by 3DAgentWorld

Prompt compression toolkit for LLM inference efficiency

Created 1 year ago

Updated 9 months ago

Meta-voicebox by SpeechifyInc

PyTorch implementation of Meta's Voicebox speech model

Created 2 years ago

Updated 2 years ago

deepspeech-german by AASHISHAG

ASR module using Mozilla DeepSpeech for German speech

Created 6 years ago

Updated 2 years ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind) and

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

huggingsound by jonatasgrosman

Speech toolkit for speech-related tasks based on Hugging Face's tools

Created 3 years ago

Updated 2 years ago

Starred by

Jong Wook Kim

Jong Wook Kim(Research Scientist at OpenAI).

KLUE by KLUE-benchmark

Korean NLU benchmark for advancing Korean NLP

Created 5 years ago

Updated 3 years ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI) and

Lianmin Zheng

Lianmin Zheng(Coauthor of SGLang, vLLM).

DeepSeek-V3.2-Exp by deepseek-ai

Experimental LLM boosting long-context efficiency

Created 2 months ago

Updated 1 week ago

SONAR by facebookresearch

Multilingual/multimodal embeddings for text and speech tasks

Created 2 years ago

Updated 1 month ago

vits-simple-api by Artrajz

HTTP API for VITS-based text-to-speech and voice conversion

Created 2 years ago

Updated 1 month ago

Whisper-Finetune by yeyupiaoling

Whisper finetuning and inference toolkit

Created 2 years ago

Updated 1 day ago

FunASR by modelscope

Speech recognition toolkit for bridging research and industrial applications

Created 3 years ago

Updated 2 months ago

Starred by

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

15 more.

OpenNMT-py by OpenNMT

PyTorch framework for neural machine translation and LLM experimentation

Created 8 years ago

Updated 1 month ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai),

Alex Cheema

Alex Cheema(Cofounder of EXO Labs), and

22 more.

unilm by microsoft

Foundation models for language, vision, speech, and multimodal tasks

Created 6 years ago

Updated 5 months ago

Feedback? Help us improve.