Discover and explore top open-source AI tools and projects—updated daily.
FoundationVisionMultimodal generation research paper
Top 52.9% on SourcePulse
Liquid is a scalable and unified autoregressive generation paradigm that integrates multimodal comprehension and generation using a single large language model (LLM). It targets researchers and developers working with multimodal AI, offering a unified approach that eliminates the need for external visual embeddings like CLIP and demonstrates a scaling law where performance degradation from unified training diminishes with model size.
How It Works
Liquid employs a single LLM for both visual and language tasks, enabling a unified token space. This architecture allows visual comprehension and generation to mutually enhance each other. The project highlights a discovered scaling law indicating that the performance drop associated with unified multimodal training is mitigated as model size increases, with models ranging from 0.5B to 32B parameters.
Quick Start & Requirements
pip install gradio==4.44.1 gradio_client==1.3.0transformers library. For low VRAM GPUs (<30GB), use --load_8bit.python app.py in the evaluation directory.python inference_t2t.py --model_path Junfeng5/Liquid_V1_7B --prompt "..."python inference_i2t.py --model_path Junfeng5/Liquid_V1_7B --image_path samples/baklava.png --prompt '...'python inference_t2i.py --model_path Junfeng5/Liquid_V1_7B --prompt "..." [--load_8bit]Data.md and TRAIN.md.Highlighted Details
Maintenance & Community
The project is associated with authors from HUST, ByteDance, and HKU. Checkpoints and evaluation scripts for Liquid-7B-IT are released.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
Checkpoints for pre-trained models beyond Liquid-7B-IT (0.5B-32B) are not yet released. Training codes are available, but require referring to separate documentation.
7 months ago
Inactive
kohjingyu
baaivision
InternLM
NExT-GPT