TinyLLaVA_Factory by TinyLLaVA

Framework for small-scale large multimodal models research

Created 1 year ago

947 stars

Top 38.7% on SourcePulse

Project Summary

TinyLLaVA Factory provides a modular and extensible PyTorch framework for building and training small-scale Large Multimodal Models (LMMs). It targets researchers and developers aiming to create efficient LMMs with reduced coding effort and improved reproducibility, offering competitive performance against larger models.

How It Works

The framework employs a modular design, allowing users to easily swap components for LLMs, vision towers, and connectors. It supports various combinations, including LLMs like OpenELM, TinyLlama, StableLM, Qwen, Gemma, and Phi, paired with vision towers such as CLIP, SigLIP, and Dino. Connectors like MLP, Qformer, and Resampler facilitate integration. The framework also supports diverse training recipes, including frozen, fully, and partially tuning, along with LoRA/QLoRA.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create -n tinyllava_factory python=3.10), activate it, and install with pip install -e .. Additional packages like flash-attn==2.5.7 are required.
Prerequisites: Python 3.10, PyTorch, HuggingFace libraries. GPU acceleration is essential for training and inference.
Resources: Training requires significant GPU resources; specific requirements depend on the chosen model configurations.
Docs: https://tinyllava-factory.readthedocs.io/en/latest/
Demo: http://8843843nmph5.vicp.fun/#/ (Password: '1234')

Highlighted Details

Achieves competitive performance, with models like TinyLLaVA-Phi-2-SigLIP-3.1B outperforming existing 7B models on benchmarks.
Supports a wide range of LLM and vision tower combinations for customization.
Offers modularity for easy integration of new models and methods.
Includes a visualization tool for interpreting model predictions.

Maintenance & Community

The project is actively developed, with recent updates including TinyLLaVA-Video and a visualization tool. It is built upon the LLaVA project and uses data from ShareGPT4V. Contact is available via GitHub Issues or WeChat.

Licensing & Compatibility

License: Apache 2.0.
Compatibility: Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project's "old codebase" (TinyLLaVABench) has been moved to a separate branch, indicating potential breaking changes or migration efforts for users of the previous version. Specific performance claims are based on benchmarks provided by the authors.

TinyLLaVA_Factory by TinyLLaVA

Explore Similar Projects

Awesome-Multimodal-LLM by HenryHZY

llava-phi by xmoanvaf

lmms-finetune by zjysteven

Awesome_Matching_Pretraining_Transfering by Paranioar

Uni-MoE by HITsz-TMG

TinyGPT-V by DLYuanGod

molmo by allenai

multimodal by facebookresearch

open_flamingo by mlfoundations

NExT-GPT by NExT-GPT

xtuner by InternLM

axolotl by axolotl-ai-cloud