SEED by AILab-CVC

Multimodal LLM research paper with visual tokenization

Created 2 years ago

638 stars

Top 52.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Project Summary

SEED-LLaMA is an open-source project providing the official implementation for SEED-LLaMA, a multimodal large language model capable of both visual comprehension and generation. It is designed for researchers and developers working on integrating vision and language capabilities into AI models, offering emergent abilities like multi-turn multimodal generation.

How It Works

SEED-LLaMA leverages a proprietary SEED tokenizer to convert visual signals into discrete visual tokens. This approach captures essential semantics while maintaining a 1D causal dependency, enabling seamless integration with LLMs. The model is built upon LLaMA2, with specific versions (8B and 14B) available, and supports efficient multi-node training via DeepSpeed.

Quick Start & Requirements

Installation: Clone the repository and install dependencies using pip install -r requirements.txt.
Prerequisites: Python >= 3.8, PyTorch >= 1.11.0, NVIDIA GPU with CUDA.
Demo: Local demos require a single 16GB (8B) or 24GB (14B) GPU. Backend and frontend scripts are provided to launch the Gradio demo.
Documentation: Technical details are available in linked arXiv papers.

Highlighted Details

Supports multimodal comprehension and generation, including interleaved image-text content.
Features an instruction-tuned model that can generate informative text and images in a single response.
Offers an upgraded version, SEED-X, with continuous visual embeddings for multi-granularity comprehension.
Model weights for SEED tokenizer and SEED-LLaMA (8B/14B) are available.

Maintenance & Community

The project is actively developed by Tencent AI Lab and ARC Lab. Updates are regularly posted, including the release of SEED-X and training code for SEED-LLaMA. Inquiries can be directed to seed-x@googlegroups.com.

Licensing & Compatibility

SEED is released under the Apache License Version 2.0. SEED-LLaMA is released under the original license of LLaMA2.

Limitations & Caveats

The project is described as "still in progress." While the instruction-tuned model can generate image-text interleaved content, the released SFT model does not possess this specific feature, as it was handled separately during instruction tuning.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days