Vision-language project for panoptic visual recognition and open-world understanding
Top 64.0% on sourcepulse
This project provides the official implementation for the "All-Seeing Project" and its successor, "All-Seeing Project V2," which aim to advance panoptic visual recognition, understanding, and general relation comprehension in open-world scenarios. It targets researchers and developers working on large-scale vision-language models and multimodal understanding, offering novel datasets and foundation models with state-of-the-art performance.
How It Works
The project introduces the All-Seeing Dataset (AS-1B) and All-Seeing Model (ASM), a unified vision-language foundation model trained on this dataset. ASM aligns with LLMs for versatile image-text retrieval and generation, exhibiting strong zero-shot capabilities. The V2 iteration introduces the Relation Conversation (ReC) task and the AS-V2 dataset, enabling models to perform text generation, object localization, and relation comprehension simultaneously. ASMv2 integrates ReC, enhancing grounding and referring abilities for region-level tasks and adapting to Scene Graph Generation.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project has seen significant updates, including the release of ASMv2 and acceptance into ECCV 2024. Key components like ASM, AS-Core, AS-10M, and AS-100M have been released. The project is associated with OpenGVLab. Links to Zhihu and Medium are provided for further details.
Licensing & Compatibility
This project is released under the Apache 2.0 license, which permits commercial use and modification.
Limitations & Caveats
The README does not detail specific hardware or software requirements for running the models, nor does it provide estimated setup times. The full version of AS-1B is not yet released.
11 months ago
1+ week