MoE-LLaVA  by PKU-YuanGroup

Vision-language model research paper using Mixture-of-Experts

Created 1 year ago
2,240 stars

Top 20.3% on SourcePulse

GitHubView on GitHub
Project Summary

MoE-LLaVA introduces a Mixture-of-Experts (MoE) approach to enhance Large Vision-Language Models (LVLMs). It targets researchers and developers seeking efficient and high-performing multimodal models, offering improved capabilities with sparser parameter activation.

How It Works

MoE-LLaVA integrates a sparse MoE layer into existing vision-language architectures. This allows for selective activation of expert networks based on input, leading to more efficient computation and potentially better performance. The project leverages a simple MoE tuning stage, enabling rapid training on modest hardware.

Quick Start & Requirements

  • Install: Clone the repository, activate a Python 3.10 environment, and install dependencies with pip install -e . and pip install -e ".[train]". flash-attn is also recommended.
  • Prerequisites: Python 3.10, PyTorch 2.0.1, CUDA >= 11.7, Transformers 4.37.0, Tokenizers 0.15.1.
  • Resources: Training can be completed on 8 A100 GPUs within a day.
  • Demos: Hugging Face Spaces demo and Colab notebook are available.

Highlighted Details

  • Achieves performance comparable to LLaVA-1.5-7B with only 3B sparsely activated parameters.
  • Outperforms LLaVA-1.5-13B on object hallucination benchmarks.
  • Offers models based on Phi2, Qwen, and StableLM backbones.
  • Supports Gradio Web UI and CLI inference.

Maintenance & Community

The project is actively maintained by the PKU-YuanGroup. Related projects include Video-LLaVA and LanguageBind.

Licensing & Compatibility

Released under Apache 2.0 license. However, usage is subject to the LLaMA model license, OpenAI's Terms of Use for generated data, and ShareGPT's privacy practices, implying restrictions on commercial use and data privacy.

Limitations & Caveats

The project notes that flash attention2 may cause performance degradation. The license terms for underlying models and data sources may impose significant restrictions on deployment and commercial use.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
25 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.