all-seeing by OpenGVLab

Vision-language project for panoptic visual recognition and open-world understanding

Created 2 years ago

504 stars

Top 61.8% on SourcePulse

Project Summary

This project provides the official implementation for the "All-Seeing Project" and its successor, "All-Seeing Project V2," which aim to advance panoptic visual recognition, understanding, and general relation comprehension in open-world scenarios. It targets researchers and developers working on large-scale vision-language models and multimodal understanding, offering novel datasets and foundation models with state-of-the-art performance.

How It Works

The project introduces the All-Seeing Dataset (AS-1B) and All-Seeing Model (ASM), a unified vision-language foundation model trained on this dataset. ASM aligns with LLMs for versatile image-text retrieval and generation, exhibiting strong zero-shot capabilities. The V2 iteration introduces the Relation Conversation (ReC) task and the AS-V2 dataset, enabling models to perform text generation, object localization, and relation comprehension simultaneously. ASMv2 integrates ReC, enhancing grounding and referring abilities for region-level tasks and adapting to Scene Graph Generation.

Quick Start & Requirements

Installation: Code is available via GitHub. Pre-trained models and datasets are released on Hugging Face.
Prerequisites: Specific hardware requirements (e.g., GPUs) and software dependencies (e.g., Python versions, CUDA) are not explicitly detailed in the README but are implied for training and running large foundation models.
Resources: Training and running these models likely require significant computational resources and storage for the large datasets (AS-1B, AS-V2).
Links:
- Paper V1: https://arxiv.org/abs/2308.01907
- Paper V2: https://arxiv.org/abs/2402.19474
- Models/Data: Hugging Face (link not provided in README)
- Demo: OpenXLab (link not provided in README)

Highlighted Details

ASMv2 achieves state-of-the-art performance on various image-level and region-level tasks.
Introduces the novel Relation Conversation (ReC) task and the AS-V2 dataset for multimodal LLMs.
Developed the Circular-based Relation Probing Evaluation (CRPE) benchmark for systematic relation comprehension assessment.
Offers large-scale datasets: AS-1B (1 billion samples), AS-100M, AS-10M, and AS-Core.

Maintenance & Community

The project has seen significant updates, including the release of ASMv2 and acceptance into ECCV 2024. Key components like ASM, AS-Core, AS-10M, and AS-100M have been released. The project is associated with OpenGVLab. Links to Zhihu and Medium are provided for further details.

Licensing & Compatibility

This project is released under the Apache 2.0 license, which permits commercial use and modification.

Limitations & Caveats

The README does not detail specific hardware or software requirements for running the models, nor does it provide estimated setup times. The full version of AS-1B is not yet released.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days