SEED-X by AILab-CVC

Multimodal AI assistant for real-world applications

Created 1 year ago

552 stars

Top 57.9% on SourcePulse

Project Summary

SEED-X is a unified multimodal foundation model designed for real-world AI assistants, capable of multi-granularity comprehension and generation. It targets developers and researchers seeking versatile multimodal capabilities, offering instruction-tuned variants for tasks like image editing and story generation.

How It Works

SEED-X unifies multimodal comprehension and generation by integrating a de-tokenizer for image reconstruction from ViT features and supporting multi-turn conversations with images, text, and bounding boxes. Its architecture allows for instruction tuning on diverse datasets, enabling specialized functionalities like high-precision image editing and multimodal long story generation.

Quick Start & Requirements

Installation: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: Python >= 3.8 (Anaconda recommended), PyTorch >= 2.0.1, NVIDIA GPU with CUDA.
Model Weights: Download checkpoints from Hugging Face (e.g., AILab-CVC/seed-x-17b-pretrain) and place them in ./pretrained. Requires Stable Diffusion XL and Qwen-VL-Chat weights.
Resources: Training requires multi-node setup with DeepSpeed. Inference demos are available online.
Links: Online Demo, SEED-Story, SEED-Data-Edit.

Highlighted Details

Supports multi-turn conversations with dynamic image resolutions.
Offers specialized models: SEED-X-I (general instruction-tuned), SEED-X-Edit (image editing), SEED-Story (multimodal story generation).
Released training code for instruction tuning with DeepSpeed Zero-2/3 support.
Includes a large-scale image editing dataset (3.7M samples).

Maintenance & Community

The project is actively developed by AILab-CVC. Latest updates include SEED-Story and SEED-Data-Edit releases.

Licensing & Compatibility

Licensed under Apache License Version 2.0. LLaMA2 parameters are frozen, with LoRA modules optimized during training.

Limitations & Caveats

The general instruction-tuned model SEED-X-I does not support image manipulation. Inference code for SEED-X-Edit was slated for release soon as of the README's last update.

SEED-X by AILab-CVC

Explore Similar Projects

cobra by h-zhao1997

VARGPT by VARGPT-family

SEED by AILab-CVC

Awesome-Unified-Multimodal-Models by showlab

Awesome_Matching_Pretraining_Transfering by Paranioar

magma by Aleph-Alpha-Research

PandaGPT by yxuansu

SEED-Story by TencentARC

PaddleMIX by PaddlePaddle

OFA by OFA-Sys

smollm by huggingface

NExT-GPT by NExT-GPT