Segment-Everything-Everywhere-All-At-Once by UX-Decoder

Multi-modal segmentation research paper

Created 2 years ago

4,766 stars

Top 10.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Junyang Lin

Core Maintainer at Alibaba Qwen

Jianwei Yang

Research Scientist at Meta Superintelligence Lab

Jiaming Song

Chief Scientist at Luma AI

Project Summary

SEEM (Segment Everything Everywhere All at Once) is a NeurIPS 2023 paper and official implementation that enables versatile image segmentation using multi-modal prompts. It targets researchers and developers looking for a unified, interactive segmentation model that handles various prompt types and offers semantic awareness.

How It Works

SEEM utilizes a unified prompt encoder to process diverse inputs—including points, boxes, scribbles, text, and audio—into a joint representation space. This approach allows for compositional and interactive segmentation, where users can refine masks through multi-round interactions, leveraging the model's memory of session history. The architecture is built upon the X-Decoder, a generalist decoder capable of multiple tasks.

Quick Start & Requirements

Install/Run: git clone git@github.com:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git && sh assets/scripts/run_demo.sh
Prerequisites: Python, PyTorch. Specific hardware requirements (e.g., GPU) are not explicitly detailed but are implied for demo execution.
Links: Project Page, Demo, Paper, Code

Highlighted Details

Supports versatile prompts: points, boxes, scribbles, text, referring images, and audio.
Achieves compositional and interactive segmentation with multi-round refinement.
Provides semantic-aware predictions, assigning semantic labels to masks.
Demonstrates strong performance on various segmentation tasks, outperforming SAM in interaction and semantic capabilities.

Maintenance & Community

The project is associated with NeurIPS 2023 and has seen recent updates, including integration into LLaVA-Interactive and use in Set-of-Mark Prompting. Related projects include FocalNet, DaViT, UniCL, Semantic-SAM, OpenSeed, Grounding SAM, Grounding DINO, and X-GPT.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The model's default vocabulary is limited to COCO 80 categories; other objects may be misclassified or labeled as 'others'. Open-vocabulary segmentation requires including text labels with specific prompts.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days