LLaVA-pp by mbzuai-oryx

Extends LLaVA with LLaMA-3 and Phi-3 for enhanced visual capabilities

Created 1 year ago

850 stars

Top 42.1% on SourcePulse

Project Summary

LLaVA++ extends the LLaVA 1.5 multimodal model by integrating the recently released LLaMA-3 Instruct 8B and Phi-3 Mini Instruct 3.8B models. This project targets researchers and developers working with visual-language models, offering enhanced capabilities for instruction following and academic tasks.

How It Works

The project integrates new language models by modifying core LLaVA components, including the model builder, language model definitions, and training scripts. It provides pre-trained, LoRA-tuned, and fully fine-tuned versions of both LLaVA-Phi-3-mini and LLaVA-LLaMA-3-8B, enabling users to leverage these advanced LLMs within the LLaVA framework.

Quick Start & Requirements

Installation: Clone the repository and update submodules. Requires manual copying of specific files from the Phi-3-V or LLaMA-3-V directories into the main LLaVA directory to integrate the respective models.
Dependencies: Requires updating transformers library to a specific commit. Training scripts are provided for both Phi-3-V and LLaMA-3-V pre-training and LoRA fine-tuning.
Resources: Google Colab notebooks are available for Phi-3-V. Specific hardware requirements for training are not detailed but are expected to be substantial.
Demos: Online demos for LLaMA-3-V and Phi-3-V are available via Hugging Face Spaces.

Highlighted Details

Integrates LLaMA-3 Instruct 8B and Phi-3 Mini Instruct 3.8B with LLaVA 1.5.
Offers pre-trained, LoRA, fully fine-tuned, and S2 fine-tuned model weights.
Provides specific scripts for pre-training and fine-tuning both Phi-3-V and LLaMA-3-V.
Includes links to Hugging Face model pages and online demos.

Maintenance & Community

The project is associated with Mohamed bin Zayed University of AI (MBZUAI). Contact emails are provided for support.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. However, it builds upon LLaVA, which is typically released under an Apache 2.0 license. Compatibility with commercial or closed-source applications would depend on the underlying LLaVA license and the licenses of the integrated LLMs (LLaMA-3 and Phi-3).

Limitations & Caveats

The README indicates that the project is actively being updated with new releases. Integration requires manual file copying, suggesting a less streamlined setup process. Specific performance benchmarks beyond general claims are not detailed.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days