Discover and explore top open-source AI tools and projects—updated daily.
cokeshaoA comprehensive survey of multimodal token compression techniques
Top 95.4% on SourcePulse
This repository serves as a comprehensive survey of multimodal token compression techniques, addressing the critical challenge of excessive tokenization in processing large image, video, and audio inputs by Multimodal Large Language Models (MLLMs). It targets researchers and engineers seeking to improve the efficiency and scalability of MLLMs for real-world applications where input data dimensions often exceed the capacity of current models. The primary benefit is a curated, organized overview of state-of-the-art methods, accelerating the understanding and adoption of efficient multimodal AI.
How It Works
This project functions as a curated collection and structured survey of academic papers focused on multimodal token compression. It categorizes research by modality (Image LLM, Video LLM, Audio LLM) and underlying architectural components (e.g., Vision Transformer, Audio Transformer). The repository provides direct links to papers, associated GitHub repositories, and Hugging Face models, enabling users to quickly access and evaluate relevant work. A key feature is a Notion database for efficient searching and filtering of the surveyed literature.
Quick Start & Requirements
This is a survey repository and does not require installation or specific software prerequisites for direct use. Users can access the survey paper via arXiv [2507.20198] and explore the curated database via the provided Notion link.
Highlighted Details
Maintenance & Community
The repository shows recent activity, with updates noted in October, August, and July of 2025, indicating active maintenance. Contact information for the authors is provided for suggestions, clarifications, or collaboration opportunities.
Licensing & Compatibility
The project is licensed under the MIT License, which permits broad use, modification, and distribution, including for commercial purposes, with minimal restrictions beyond attribution.
Limitations & Caveats
As a survey, this repository is a curated snapshot of existing research and may not include every emerging technique immediately. It serves as a guide to external resources rather than providing direct implementation code or tools.
3 weeks ago
Inactive
OpenMOSS
NExT-GPT
LargeWorldModel