lynx-llm by bytedance

Multimodal model research for GPT4-style training

Created 2 years ago

270 stars

Top 95.4% on SourcePulse

Project Summary

This repository provides the Lynx model, an 8B parameter large language model designed for multimodal understanding of images and videos. It addresses the challenges of integrating visual information into LLMs, targeting researchers and developers working on multimodal AI applications. The project offers a framework for training and evaluating such models, with released checkpoints and benchmark results.

How It Works

Lynx integrates visual features from a Vision Transformer (EVA-CLIP ViT-G) into a Vicuna-7B language model. This approach allows the LLM to process and reason about visual content alongside text. The model architecture and training methodology are detailed in an accompanying arXiv paper, focusing on key factors for effective multimodal LLM training.

Quick Start & Requirements

Install: conda env create -f environment.yml followed by conda activate lynx.
Prerequisites: Python 3.x, Conda, Git LFS. Requires downloading specific datasets (Open-VQA, Places365, VQAv2, OCRVQA, Something-Something-v.2, MSVD-QA, NeXT-QA, MSRVTT-QA) and pre-trained checkpoints (EVA-CLIP ViT-G, Vicuna-7B, Lynx checkpoints).
Setup: Requires significant effort to download and organize datasets and checkpoints.
Docs: https://lynx-llm.github.io/

Highlighted Details

Achieves strong results on Open-VQA, OwlEval, and MME benchmarks.
Provides pre-trained and fine-tuned Lynx checkpoints.
Supports both image and video multimodal inputs.
Detailed ablation studies on training factors are available.

Maintenance & Community

Developed by ByteDance.
Contact via GitHub issues for support.

Licensing & Compatibility

Licensed under Apache-2.0.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The setup process is complex, requiring manual downloading and organization of multiple large datasets and model checkpoints. The project is presented as a research release, and extensive user support beyond GitHub issues is not explicitly stated.

lynx-llm by bytedance

Explore Similar Projects

Open-R1-Video by Wang-Xiaodong1899

GroundingGPT by lzw-lzw

unified_video_action by ShuangLI59

LVM by ytongbai

BiomedGPT by taokz

TinyLLaVA_Factory by TinyLLaVA

Vary by Ucas-HaoranWei

lxmert by airsplay

Qwen-VL-Series-Finetune by 2U1

VILA by NVlabs

EasyR1 by hiyouga

minimind-v by jingyaogong