Awesome-Token-Compress  by daixiangzi

Paper list for token compression methods in Vision Transformers (ViT) and Vision Language Models (VLM)

Created 1 year ago
652 stars

Top 51.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated list of recent research papers focused on "Token Compress" techniques for Vision Transformers (ViTs) and Vision-Language Models (VLMs). It serves as a valuable resource for researchers and engineers looking to improve the efficiency and speed of these models, particularly for tasks involving long sequences or high-resolution inputs.

How It Works

The project compiles a comprehensive collection of papers that propose methods for reducing the number of tokens processed by ViTs and VLMs. These methods often involve techniques like pruning, clustering, merging, or dynamic selection of tokens, aiming to maintain performance while significantly decreasing computational cost and memory usage. The advantage of this approach lies in its ability to accelerate inference and training without substantial accuracy degradation.

Highlighted Details

  • Extensive coverage of techniques for both image and video understanding tasks.
  • Includes papers from major conferences like CVPR, ICLR, ECCV, NeurIPS, and AAAI.
  • Features methods applicable to various model architectures and multimodal tasks.
  • Highlights projects with associated GitHub repositories for practical implementation.

Maintenance & Community

This is a static list of papers, with updates reflecting recent publications in the field of efficient VLMs. There are no direct community channels or active development mentioned for the list itself, but many linked papers have their own active communities and repositories.

Licensing & Compatibility

The repository itself is a list and does not contain code that would typically have licensing restrictions. However, users should refer to the individual licenses of the linked papers and their associated code repositories for usage terms.

Limitations & Caveats

This resource is a bibliography and does not provide implementations, benchmarks, or direct comparisons of the listed techniques. Users must consult the individual papers for details on performance, limitations, and implementation requirements.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
50 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Phil Wang Phil Wang(Prolific Research Paper Implementer).

Cosmos-Tokenizer by NVIDIA

0.1%
2k
Suite of neural tokenizers for image and video processing
Created 10 months ago
Updated 7 months ago
Feedback? Help us improve.