X-Decoder by microsoft

Generalized decoding model for pixel, image, and language tasks

Created 3 years ago

1,338 stars

Top 29.8% on SourcePulse

View on GitHub

7 Experts Love This Project

Jianwei Yang

Research Scientist at Meta Superintelligence Lab

Jiaming Song

Chief Scientist at Luma AI

Saining Xie

Professor at NYU

Omar Sanseviero

DevRel at Google DeepMind

and 3 more!

Project Summary

X-Decoder is a generalized decoding model designed for unified pixel-level segmentation and token-level text generation across various vision and language tasks. It offers state-of-the-art performance on open-vocabulary and referring segmentation, and can be flexibly finetuned for tasks like image captioning, retrieval, and visual question answering.

How It Works

X-Decoder leverages a unified architecture that seamlessly integrates pixel and text decoding. It builds upon Mask2Former, enabling it to handle diverse tasks such as semantic, instance, and panoptic segmentation, as well as image captioning and retrieval, with a single set of pretrained parameters. This approach allows for zero-shot task composition, facilitating novel applications like region retrieval and image editing.

Quick Start & Requirements

Install: git clone git@github.com:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git && sh aasets/scripts/run_demo.sh
Prerequisites: Python, PyTorch. Specific requirements detailed in INSTALL.md.
Resources: Model checkpoints and comprehensive user guides are available.
Demos: HuggingFace All-in-One Demo, HuggingFace Instruct Demo

Highlighted Details

Achieves state-of-the-art results on open-vocabulary segmentation and referring segmentation across eight datasets.
Supports zero-shot task composition for region retrieval, referring captioning, and image editing.
Offers unified pretrained parameters for semantic, instance, panoptic segmentation, image captioning, and image-text retrieval.
Includes companion models like SEEM (Segment Everything Everywhere All At Once) for interactive segmentation.

Maintenance & Community

The project is associated with CVPR 2023 and has seen recent updates including training/evaluation code and new checkpoints. Related projects like OpenSeeD and X-GPT are also mentioned.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The README does not specify licensing details, which may impact commercial use or integration into closed-source projects. The project is presented as an official implementation of a CVPR 2023 paper, suggesting a focus on research and academic use.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days