Generalized decoding model for pixel, image, and language tasks
Top 30.9% on sourcepulse
X-Decoder is a generalized decoding model designed for unified pixel-level segmentation and token-level text generation across various vision and language tasks. It offers state-of-the-art performance on open-vocabulary and referring segmentation, and can be flexibly finetuned for tasks like image captioning, retrieval, and visual question answering.
How It Works
X-Decoder leverages a unified architecture that seamlessly integrates pixel and text decoding. It builds upon Mask2Former, enabling it to handle diverse tasks such as semantic, instance, and panoptic segmentation, as well as image captioning and retrieval, with a single set of pretrained parameters. This approach allows for zero-shot task composition, facilitating novel applications like region retrieval and image editing.
Quick Start & Requirements
git clone git@github.com:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git && sh aasets/scripts/run_demo.sh
INSTALL.md
.Highlighted Details
Maintenance & Community
The project is associated with CVPR 2023 and has seen recent updates including training/evaluation code and new checkpoints. Related projects like OpenSeeD and X-GPT are also mentioned.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README.
Limitations & Caveats
The README does not specify licensing details, which may impact commercial use or integration into closed-source projects. The project is presented as an official implementation of a CVPR 2023 paper, suggesting a focus on research and academic use.
1 year ago
Inactive