TinyLLaVA_Factory  by TinyLLaVA

Framework for small-scale large multimodal models research

created 1 year ago
863 stars

Top 42.4% on sourcepulse

GitHubView on GitHub
Project Summary

TinyLLaVA Factory provides a modular and extensible PyTorch framework for building and training small-scale Large Multimodal Models (LMMs). It targets researchers and developers aiming to create efficient LMMs with reduced coding effort and improved reproducibility, offering competitive performance against larger models.

How It Works

The framework employs a modular design, allowing users to easily swap components for LLMs, vision towers, and connectors. It supports various combinations, including LLMs like OpenELM, TinyLlama, StableLM, Qwen, Gemma, and Phi, paired with vision towers such as CLIP, SigLIP, and Dino. Connectors like MLP, Qformer, and Resampler facilitate integration. The framework also supports diverse training recipes, including frozen, fully, and partially tuning, along with LoRA/QLoRA.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create -n tinyllava_factory python=3.10), activate it, and install with pip install -e .. Additional packages like flash-attn==2.5.7 are required.
  • Prerequisites: Python 3.10, PyTorch, HuggingFace libraries. GPU acceleration is essential for training and inference.
  • Resources: Training requires significant GPU resources; specific requirements depend on the chosen model configurations.
  • Docs: https://tinyllava-factory.readthedocs.io/en/latest/
  • Demo: http://8843843nmph5.vicp.fun/#/ (Password: '1234')

Highlighted Details

  • Achieves competitive performance, with models like TinyLLaVA-Phi-2-SigLIP-3.1B outperforming existing 7B models on benchmarks.
  • Supports a wide range of LLM and vision tower combinations for customization.
  • Offers modularity for easy integration of new models and methods.
  • Includes a visualization tool for interpreting model predictions.

Maintenance & Community

The project is actively developed, with recent updates including TinyLLaVA-Video and a visualization tool. It is built upon the LLaVA project and uses data from ShareGPT4V. Contact is available via GitHub Issues or WeChat.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project's "old codebase" (TinyLLaVABench) has been moved to a separate branch, indicating potential breaking changes or migration efforts for users of the previous version. Specific performance claims are based on benchmarks provided by the authors.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
57 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 18 hours ago
Starred by Travis Fischer Travis Fischer(Founder of Agentic), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
9 more.

LLaVA by haotian-liu

0.2%
23k
Multimodal assistant with GPT-4 level capabilities
created 2 years ago
updated 11 months ago
Feedback? Help us improve.