Multimodal AI assistant for real-world applications
Top 60.6% on sourcepulse
SEED-X is a unified multimodal foundation model designed for real-world AI assistants, capable of multi-granularity comprehension and generation. It targets developers and researchers seeking versatile multimodal capabilities, offering instruction-tuned variants for tasks like image editing and story generation.
How It Works
SEED-X unifies multimodal comprehension and generation by integrating a de-tokenizer for image reconstruction from ViT features and supporting multi-turn conversations with images, text, and bounding boxes. Its architecture allows for instruction tuning on diverse datasets, enabling specialized functionalities like high-precision image editing and multimodal long story generation.
Quick Start & Requirements
pip install -r requirements.txt
.AILab-CVC/seed-x-17b-pretrain
) and place them in ./pretrained
. Requires Stable Diffusion XL and Qwen-VL-Chat weights.Highlighted Details
Maintenance & Community
The project is actively developed by AILab-CVC. Latest updates include SEED-Story and SEED-Data-Edit releases.
Licensing & Compatibility
Licensed under Apache License Version 2.0. LLaMA2 parameters are frozen, with LoRA modules optimized during training.
Limitations & Caveats
The general instruction-tuned model SEED-X-I does not support image manipulation. Inference code for SEED-X-Edit was slated for release soon as of the README's last update.
5 months ago
Inactive