Segment-Everything-Everywhere-All-At-Once  by UX-Decoder

Multi-modal segmentation research paper

created 2 years ago
4,677 stars

Top 10.7% on sourcepulse

GitHubView on GitHub
Project Summary

SEEM (Segment Everything Everywhere All at Once) is a NeurIPS 2023 paper and official implementation that enables versatile image segmentation using multi-modal prompts. It targets researchers and developers looking for a unified, interactive segmentation model that handles various prompt types and offers semantic awareness.

How It Works

SEEM utilizes a unified prompt encoder to process diverse inputs—including points, boxes, scribbles, text, and audio—into a joint representation space. This approach allows for compositional and interactive segmentation, where users can refine masks through multi-round interactions, leveraging the model's memory of session history. The architecture is built upon the X-Decoder, a generalist decoder capable of multiple tasks.

Quick Start & Requirements

  • Install/Run: git clone git@github.com:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git && sh assets/scripts/run_demo.sh
  • Prerequisites: Python, PyTorch. Specific hardware requirements (e.g., GPU) are not explicitly detailed but are implied for demo execution.
  • Links: Project Page, Demo, Paper, Code

Highlighted Details

  • Supports versatile prompts: points, boxes, scribbles, text, referring images, and audio.
  • Achieves compositional and interactive segmentation with multi-round refinement.
  • Provides semantic-aware predictions, assigning semantic labels to masks.
  • Demonstrates strong performance on various segmentation tasks, outperforming SAM in interaction and semantic capabilities.

Maintenance & Community

The project is associated with NeurIPS 2023 and has seen recent updates, including integration into LLaVA-Interactive and use in Set-of-Mark Prompting. Related projects include FocalNet, DaViT, UniCL, Semantic-SAM, OpenSeed, Grounding SAM, Grounding DINO, and X-GPT.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The model's default vocabulary is limited to COCO 80 categories; other objects may be misclassified or labeled as 'others'. Open-vocabulary segmentation requires including text labels with specific prompts.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
113 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.