Framework for small-scale large multimodal models research
Top 42.4% on sourcepulse
TinyLLaVA Factory provides a modular and extensible PyTorch framework for building and training small-scale Large Multimodal Models (LMMs). It targets researchers and developers aiming to create efficient LMMs with reduced coding effort and improved reproducibility, offering competitive performance against larger models.
How It Works
The framework employs a modular design, allowing users to easily swap components for LLMs, vision towers, and connectors. It supports various combinations, including LLMs like OpenELM, TinyLlama, StableLM, Qwen, Gemma, and Phi, paired with vision towers such as CLIP, SigLIP, and Dino. Connectors like MLP, Qformer, and Resampler facilitate integration. The framework also supports diverse training recipes, including frozen, fully, and partially tuning, along with LoRA/QLoRA.
Quick Start & Requirements
conda create -n tinyllava_factory python=3.10
), activate it, and install with pip install -e .
. Additional packages like flash-attn==2.5.7
are required.Highlighted Details
Maintenance & Community
The project is actively developed, with recent updates including TinyLLaVA-Video and a visualization tool. It is built upon the LLaVA project and uses data from ShareGPT4V. Contact is available via GitHub Issues or WeChat.
Licensing & Compatibility
Limitations & Caveats
The project's "old codebase" (TinyLLaVABench) has been moved to a separate branch, indicating potential breaking changes or migration efforts for users of the previous version. Specific performance claims are based on benchmarks provided by the authors.
3 months ago
1 day