all-seeing  by OpenGVLab

Vision-language project for panoptic visual recognition and open-world understanding

created 2 years ago
489 stars

Top 64.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides the official implementation for the "All-Seeing Project" and its successor, "All-Seeing Project V2," which aim to advance panoptic visual recognition, understanding, and general relation comprehension in open-world scenarios. It targets researchers and developers working on large-scale vision-language models and multimodal understanding, offering novel datasets and foundation models with state-of-the-art performance.

How It Works

The project introduces the All-Seeing Dataset (AS-1B) and All-Seeing Model (ASM), a unified vision-language foundation model trained on this dataset. ASM aligns with LLMs for versatile image-text retrieval and generation, exhibiting strong zero-shot capabilities. The V2 iteration introduces the Relation Conversation (ReC) task and the AS-V2 dataset, enabling models to perform text generation, object localization, and relation comprehension simultaneously. ASMv2 integrates ReC, enhancing grounding and referring abilities for region-level tasks and adapting to Scene Graph Generation.

Quick Start & Requirements

  • Installation: Code is available via GitHub. Pre-trained models and datasets are released on Hugging Face.
  • Prerequisites: Specific hardware requirements (e.g., GPUs) and software dependencies (e.g., Python versions, CUDA) are not explicitly detailed in the README but are implied for training and running large foundation models.
  • Resources: Training and running these models likely require significant computational resources and storage for the large datasets (AS-1B, AS-V2).
  • Links:

Highlighted Details

  • ASMv2 achieves state-of-the-art performance on various image-level and region-level tasks.
  • Introduces the novel Relation Conversation (ReC) task and the AS-V2 dataset for multimodal LLMs.
  • Developed the Circular-based Relation Probing Evaluation (CRPE) benchmark for systematic relation comprehension assessment.
  • Offers large-scale datasets: AS-1B (1 billion samples), AS-100M, AS-10M, and AS-Core.

Maintenance & Community

The project has seen significant updates, including the release of ASMv2 and acceptance into ECCV 2024. Key components like ASM, AS-Core, AS-10M, and AS-100M have been released. The project is associated with OpenGVLab. Links to Zhihu and Medium are provided for further details.

Licensing & Compatibility

This project is released under the Apache 2.0 license, which permits commercial use and modification.

Limitations & Caveats

The README does not detail specific hardware or software requirements for running the models, nor does it provide estimated setup times. The full version of AS-1B is not yet released.

Health Check
Last commit

11 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.