Segment-Everything-Everywhere-All-At-Once  by UX-Decoder

Multi-modal segmentation research paper

Created 2 years ago
4,718 stars

Top 10.5% on SourcePulse

GitHubView on GitHub
Project Summary

SEEM (Segment Everything Everywhere All at Once) is a NeurIPS 2023 paper and official implementation that enables versatile image segmentation using multi-modal prompts. It targets researchers and developers looking for a unified, interactive segmentation model that handles various prompt types and offers semantic awareness.

How It Works

SEEM utilizes a unified prompt encoder to process diverse inputs—including points, boxes, scribbles, text, and audio—into a joint representation space. This approach allows for compositional and interactive segmentation, where users can refine masks through multi-round interactions, leveraging the model's memory of session history. The architecture is built upon the X-Decoder, a generalist decoder capable of multiple tasks.

Quick Start & Requirements

  • Install/Run: git clone git@github.com:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git && sh assets/scripts/run_demo.sh
  • Prerequisites: Python, PyTorch. Specific hardware requirements (e.g., GPU) are not explicitly detailed but are implied for demo execution.
  • Links: Project Page, Demo, Paper, Code

Highlighted Details

  • Supports versatile prompts: points, boxes, scribbles, text, referring images, and audio.
  • Achieves compositional and interactive segmentation with multi-round refinement.
  • Provides semantic-aware predictions, assigning semantic labels to masks.
  • Demonstrates strong performance on various segmentation tasks, outperforming SAM in interaction and semantic capabilities.

Maintenance & Community

The project is associated with NeurIPS 2023 and has seen recent updates, including integration into LLaVA-Interactive and use in Set-of-Mark Prompting. Related projects include FocalNet, DaViT, UniCL, Semantic-SAM, OpenSeed, Grounding SAM, Grounding DINO, and X-GPT.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The model's default vocabulary is limited to COCO 80 categories; other objects may be misclassified or labeled as 'others'. Open-vocabulary segmentation requires including text labels with specific prompts.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
27 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
1 more.

InternGPT by OpenGVLab

0.1%
3k
Interactive demo platform for showcasing AI models
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
4 more.

self-operating-computer by OthersideAI

0.1%
10k
Framework for multimodal computer operation
Created 1 year ago
Updated 4 months ago
Feedback? Help us improve.