Survey of efficient multimodal large language models (MLLMs)
Top 78.8% on sourcepulse
This repository provides a comprehensive survey of efficient Multimodal Large Language Models (MLLMs), targeting researchers and developers seeking to understand and implement lightweight MLLMs for applications like edge computing. It systematically reviews architectures, strategies, and applications, offering a valuable resource for navigating the rapidly evolving field of efficient MLLMs.
How It Works
The survey categorizes efficient MLLMs based on their architectural components and optimization strategies. It details various approaches to enhance efficiency, including lightweight vision encoders (e.g., ViTamin, SigLIP), efficient projection methods (e.g., MLP, LDP, Perceiver Resampler), and the integration of smaller, more performant language models (e.g., Phi-2, Gemma, Mamba). The review also covers techniques for inference acceleration and parameter-efficient training.
Quick Start & Requirements
This repository is a survey and does not contain executable code. It links to numerous research papers and GitHub repositories for specific MLLM implementations. Users will need to refer to individual project documentation for installation and execution.
Highlighted Details
Maintenance & Community
The repository is actively maintained by the authors, with a commitment to incorporating new research. Contact information for collaboration is provided.
Licensing & Compatibility
The repository itself contains no code and is not subject to software licensing. Individual linked projects will have their own licenses.
Limitations & Caveats
As a survey, this repository does not offer direct implementation or benchmarking. Users must consult the linked papers and codebases for practical application and performance validation. The field is rapidly evolving, so the survey represents a snapshot in time.
3 months ago
1 day