lynx-llm  by bytedance

Multimodal model research for GPT4-style training

created 2 years ago
268 stars

Top 96.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the Lynx model, an 8B parameter large language model designed for multimodal understanding of images and videos. It addresses the challenges of integrating visual information into LLMs, targeting researchers and developers working on multimodal AI applications. The project offers a framework for training and evaluating such models, with released checkpoints and benchmark results.

How It Works

Lynx integrates visual features from a Vision Transformer (EVA-CLIP ViT-G) into a Vicuna-7B language model. This approach allows the LLM to process and reason about visual content alongside text. The model architecture and training methodology are detailed in an accompanying arXiv paper, focusing on key factors for effective multimodal LLM training.

Quick Start & Requirements

  • Install: conda env create -f environment.yml followed by conda activate lynx.
  • Prerequisites: Python 3.x, Conda, Git LFS. Requires downloading specific datasets (Open-VQA, Places365, VQAv2, OCRVQA, Something-Something-v.2, MSVD-QA, NeXT-QA, MSRVTT-QA) and pre-trained checkpoints (EVA-CLIP ViT-G, Vicuna-7B, Lynx checkpoints).
  • Setup: Requires significant effort to download and organize datasets and checkpoints.
  • Docs: https://lynx-llm.github.io/

Highlighted Details

  • Achieves strong results on Open-VQA, OwlEval, and MME benchmarks.
  • Provides pre-trained and fine-tuned Lynx checkpoints.
  • Supports both image and video multimodal inputs.
  • Detailed ablation studies on training factors are available.

Maintenance & Community

  • Developed by ByteDance.
  • Contact via GitHub issues for support.

Licensing & Compatibility

  • Licensed under Apache-2.0.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The setup process is complex, requiring manual downloading and organization of multiple large datasets and model checkpoints. The project is presented as a research release, and extensive user support beyond GitHub issues is not explicitly stated.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.