Extends LLaVA with LLaMA-3 and Phi-3 for enhanced visual capabilities
Top 43.3% on sourcepulse
LLaVA++ extends the LLaVA 1.5 multimodal model by integrating the recently released LLaMA-3 Instruct 8B and Phi-3 Mini Instruct 3.8B models. This project targets researchers and developers working with visual-language models, offering enhanced capabilities for instruction following and academic tasks.
How It Works
The project integrates new language models by modifying core LLaVA components, including the model builder, language model definitions, and training scripts. It provides pre-trained, LoRA-tuned, and fully fine-tuned versions of both LLaVA-Phi-3-mini and LLaVA-LLaMA-3-8B, enabling users to leverage these advanced LLMs within the LLaVA framework.
Quick Start & Requirements
Phi-3-V
or LLaMA-3-V
directories into the main LLaVA
directory to integrate the respective models.transformers
library to a specific commit. Training scripts are provided for both Phi-3-V and LLaMA-3-V pre-training and LoRA fine-tuning.Highlighted Details
Maintenance & Community
The project is associated with Mohamed bin Zayed University of AI (MBZUAI). Contact emails are provided for support.
Licensing & Compatibility
The repository itself is not explicitly licensed in the README. However, it builds upon LLaVA, which is typically released under an Apache 2.0 license. Compatibility with commercial or closed-source applications would depend on the underlying LLaVA license and the licenses of the integrated LLMs (LLaMA-3 and Phi-3).
Limitations & Caveats
The README indicates that the project is actively being updated with new releases. Integration requires manual file copying, suggesting a less streamlined setup process. Specific performance benchmarks beyond general claims are not detailed.
1 year ago
1 day