TinyLLaVA_Factory  by TinyLLaVA

Framework for small-scale large multimodal models research

Created 1 year ago
897 stars

Top 40.4% on SourcePulse

GitHubView on GitHub
Project Summary

TinyLLaVA Factory provides a modular and extensible PyTorch framework for building and training small-scale Large Multimodal Models (LMMs). It targets researchers and developers aiming to create efficient LMMs with reduced coding effort and improved reproducibility, offering competitive performance against larger models.

How It Works

The framework employs a modular design, allowing users to easily swap components for LLMs, vision towers, and connectors. It supports various combinations, including LLMs like OpenELM, TinyLlama, StableLM, Qwen, Gemma, and Phi, paired with vision towers such as CLIP, SigLIP, and Dino. Connectors like MLP, Qformer, and Resampler facilitate integration. The framework also supports diverse training recipes, including frozen, fully, and partially tuning, along with LoRA/QLoRA.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create -n tinyllava_factory python=3.10), activate it, and install with pip install -e .. Additional packages like flash-attn==2.5.7 are required.
  • Prerequisites: Python 3.10, PyTorch, HuggingFace libraries. GPU acceleration is essential for training and inference.
  • Resources: Training requires significant GPU resources; specific requirements depend on the chosen model configurations.
  • Docs: https://tinyllava-factory.readthedocs.io/en/latest/
  • Demo: http://8843843nmph5.vicp.fun/#/ (Password: '1234')

Highlighted Details

  • Achieves competitive performance, with models like TinyLLaVA-Phi-2-SigLIP-3.1B outperforming existing 7B models on benchmarks.
  • Supports a wide range of LLM and vision tower combinations for customization.
  • Offers modularity for easy integration of new models and methods.
  • Includes a visualization tool for interpreting model predictions.

Maintenance & Community

The project is actively developed, with recent updates including TinyLLaVA-Video and a visualization tool. It is built upon the LLaVA project and uses data from ShareGPT4V. Contact is available via GitHub Issues or WeChat.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project's "old codebase" (TinyLLaVABench) has been moved to a separate branch, indicating potential breaking changes or migration efforts for users of the previous version. Specific performance claims are based on benchmarks provided by the authors.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
6
Star History
26 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 4 months ago
Starred by Casper Hansen Casper Hansen(Author of AutoAWQ), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
5 more.

xtuner by InternLM

0.5%
5k
LLM fine-tuning toolkit for research
Created 2 years ago
Updated 1 day ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
26 more.

axolotl by axolotl-ai-cloud

0.5%
10k
CLI tool for streamlined post-training of AI models
Created 2 years ago
Updated 17 hours ago
Feedback? Help us improve.