prismatic-vlms by TRI-ML

VLM codebase for training visually-conditioned language models

Created 1 year ago

900 stars

Top 40.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This repository provides a flexible and efficient codebase for training visually-conditioned language models (VLMs). It targets researchers and practitioners looking to experiment with or deploy VLMs, offering support for diverse visual backbones, language models, and easy scaling for large parameter models.

How It Works

Prismatic VLMs supports multiple visual backbones (CLIP, SigLIP, DINOv2, and fusions) via TIMM integration, and arbitrary AutoModelForCausalLM instances from Hugging Face Transformers. Training leverages PyTorch FSDP and Flash-Attention for efficient scaling from 1B to 34B parameters.

Quick Start & Requirements

Install via pip install -e . after cloning.
Requires Python >= 3.8, PyTorch >= 2.1, and Flash-Attention 2.
Official quick-start and usage examples are provided in the README.

Highlighted Details

Supports a wide range of visual backbones and language models.
Enables efficient training of large-scale VLMs (1B-34B parameters).
Offers a comprehensive evaluation codebase for VLMs.
Provides 49 pretrained VLM models with detailed descriptions.

Maintenance & Community

The project is actively maintained by TRI-ML. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The code is released under the MIT License. Pretrained models inherit licenses from their base datasets and LMs (e.g., Llama Community License for Llama-2 derived models, Apache/MIT for Mistral/Phi-2 derived models). Commercial use is permitted for models adhering to compatible licenses.

Limitations & Caveats

Pretrained models may have licensing restrictions inherited from their training data and base language models. Users must ensure compliance with these underlying licenses.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

27 stars in the last 30 days