Paper list for token compression methods in Vision Transformers (ViT) and Vision Language Models (VLM)
Top 56.7% on sourcepulse
This repository is a curated list of recent research papers focused on "Token Compress" techniques for Vision Transformers (ViTs) and Vision-Language Models (VLMs). It serves as a valuable resource for researchers and engineers looking to improve the efficiency and speed of these models, particularly for tasks involving long sequences or high-resolution inputs.
How It Works
The project compiles a comprehensive collection of papers that propose methods for reducing the number of tokens processed by ViTs and VLMs. These methods often involve techniques like pruning, clustering, merging, or dynamic selection of tokens, aiming to maintain performance while significantly decreasing computational cost and memory usage. The advantage of this approach lies in its ability to accelerate inference and training without substantial accuracy degradation.
Highlighted Details
Maintenance & Community
This is a static list of papers, with updates reflecting recent publications in the field of efficient VLMs. There are no direct community channels or active development mentioned for the list itself, but many linked papers have their own active communities and repositories.
Licensing & Compatibility
The repository itself is a list and does not contain code that would typically have licensing restrictions. However, users should refer to the individual licenses of the linked papers and their associated code repositories for usage terms.
Limitations & Caveats
This resource is a bibliography and does not provide implementations, benchmarks, or direct comparisons of the listed techniques. Users must consult the individual papers for details on performance, limitations, and implementation requirements.
3 days ago
1 day