Awesome-Token-Compress by daixiangzi

Paper list for token compression methods in Vision Transformers (ViT) and Vision Language Models (VLM)

Created 1 year ago

805 stars

Top 43.9% on SourcePulse

Project Summary

This repository is a curated list of recent research papers focused on "Token Compress" techniques for Vision Transformers (ViTs) and Vision-Language Models (VLMs). It serves as a valuable resource for researchers and engineers looking to improve the efficiency and speed of these models, particularly for tasks involving long sequences or high-resolution inputs.

How It Works

The project compiles a comprehensive collection of papers that propose methods for reducing the number of tokens processed by ViTs and VLMs. These methods often involve techniques like pruning, clustering, merging, or dynamic selection of tokens, aiming to maintain performance while significantly decreasing computational cost and memory usage. The advantage of this approach lies in its ability to accelerate inference and training without substantial accuracy degradation.

Highlighted Details

Extensive coverage of techniques for both image and video understanding tasks.
Includes papers from major conferences like CVPR, ICLR, ECCV, NeurIPS, and AAAI.
Features methods applicable to various model architectures and multimodal tasks.
Highlights projects with associated GitHub repositories for practical implementation.

Maintenance & Community

This is a static list of papers, with updates reflecting recent publications in the field of efficient VLMs. There are no direct community channels or active development mentioned for the list itself, but many linked papers have their own active communities and repositories.

Licensing & Compatibility

The repository itself is a list and does not contain code that would typically have licensing restrictions. However, users should refer to the individual licenses of the linked papers and their associated code repositories for usage terms.

Limitations & Caveats

This resource is a bibliography and does not provide implementations, benchmarks, or direct comparisons of the listed techniques. Users must consult the individual papers for details on performance, limitations, and implementation requirements.

Awesome-Token-Compress by daixiangzi

Explore Similar Projects

ImageFolder by lxa9867

TokenPacker by CircleRadon

VisionZip by JIA-Lab-research

Awesome-Multimodal-Token-Compression by cokeshao

VLM2Vec by TIGER-AI-Lab

awesome-vlm-architectures by gokayfem

CVPR-2022-Papers by 52CV

Gemini by kyegomez

transfusion-pytorch by lucidrains

Cosmos-Tokenizer by NVIDIA

minimind-v by jingyaogong

Efficient-AI-Backbones by huawei-noah