MoE-LLaVA by PKU-YuanGroup

Vision-language model research paper using Mixture-of-Experts

Created 2 years ago

2,297 stars

Top 19.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

DevRel at Google DeepMind

Project Summary

MoE-LLaVA introduces a Mixture-of-Experts (MoE) approach to enhance Large Vision-Language Models (LVLMs). It targets researchers and developers seeking efficient and high-performing multimodal models, offering improved capabilities with sparser parameter activation.

How It Works

MoE-LLaVA integrates a sparse MoE layer into existing vision-language architectures. This allows for selective activation of expert networks based on input, leading to more efficient computation and potentially better performance. The project leverages a simple MoE tuning stage, enabling rapid training on modest hardware.

Quick Start & Requirements

Install: Clone the repository, activate a Python 3.10 environment, and install dependencies with pip install -e . and pip install -e ".[train]". flash-attn is also recommended.
Prerequisites: Python 3.10, PyTorch 2.0.1, CUDA >= 11.7, Transformers 4.37.0, Tokenizers 0.15.1.
Resources: Training can be completed on 8 A100 GPUs within a day.
Demos: Hugging Face Spaces demo and Colab notebook are available.

Highlighted Details

Achieves performance comparable to LLaVA-1.5-7B with only 3B sparsely activated parameters.
Outperforms LLaVA-1.5-13B on object hallucination benchmarks.
Offers models based on Phi2, Qwen, and StableLM backbones.
Supports Gradio Web UI and CLI inference.

Maintenance & Community

The project is actively maintained by the PKU-YuanGroup. Related projects include Video-LLaVA and LanguageBind.

Licensing & Compatibility

Released under Apache 2.0 license. However, usage is subject to the LLaMA model license, OpenAI's Terms of Use for generated data, and ShareGPT's privacy practices, implying restrictions on commercial use and data privacy.

Limitations & Caveats

The project notes that flash attention2 may cause performance degradation. The license terms for underlying models and data sources may impose significant restrictions on deployment and commercial use.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days