Efficient-Multimodal-LLMs-Survey by swordlidev

Survey of efficient multimodal large language models (MLLMs)

Created 1 year ago

388 stars

Top 74.1% on SourcePulse

Project Summary

This repository provides a comprehensive survey of efficient Multimodal Large Language Models (MLLMs), targeting researchers and developers seeking to understand and implement lightweight MLLMs for applications like edge computing. It systematically reviews architectures, strategies, and applications, offering a valuable resource for navigating the rapidly evolving field of efficient MLLMs.

How It Works

The survey categorizes efficient MLLMs based on their architectural components and optimization strategies. It details various approaches to enhance efficiency, including lightweight vision encoders (e.g., ViTamin, SigLIP), efficient projection methods (e.g., MLP, LDP, Perceiver Resampler), and the integration of smaller, more performant language models (e.g., Phi-2, Gemma, Mamba). The review also covers techniques for inference acceleration and parameter-efficient training.

Quick Start & Requirements

This repository is a survey and does not contain executable code. It links to numerous research papers and GitHub repositories for specific MLLM implementations. Users will need to refer to individual project documentation for installation and execution.

Highlighted Details

Provides a timeline of representative efficient MLLMs, showcasing the rapid development in the field.
Summarizes 17 mainstream efficient MLLMs with key details like vision encoder, resolution, parameter sizes, and projector types.
Covers diverse efficiency strategies including Mixture of Experts (MoE), Mamba-based architectures, and various token compression/processing techniques.
Discusses applications across biomedical analysis, document understanding, and video comprehension.

Maintenance & Community

The repository is actively maintained by the authors, with a commitment to incorporating new research. Contact information for collaboration is provided.

Licensing & Compatibility

The repository itself contains no code and is not subject to software licensing. Individual linked projects will have their own licenses.

Limitations & Caveats

As a survey, this repository does not offer direct implementation or benchmarking. Users must consult the linked papers and codebases for practical application and performance validation. The field is rapidly evolving, so the survey represents a snapshot in time.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days