Seg-Zero by JIA-Lab-research

Research paper implementation for reasoning-guided segmentation via cognitive reinforcement

Created 10 months ago

583 stars

Top 55.6% on SourcePulse

Project Summary

Seg-Zero is an open-source project that enables visual reasoning for image segmentation tasks, allowing models to generate a chain of thought before producing segmentation masks. It is designed for researchers and developers working on advanced computer vision and multimodal AI, offering emergent test-time reasoning capabilities without explicit supervised reasoning data.

How It Works

Seg-Zero employs a decoupled architecture comprising a reasoning model and a segmentation model. It utilizes a sophisticated reward mechanism that integrates both format and accuracy rewards, trained exclusively via reinforcement learning (GRPO). This approach allows the model to learn reasoning processes implicitly, leading to superior performance on both in-domain and out-of-domain data compared to supervised fine-tuning.

Quick Start & Requirements

Install: git clone https://github.com/dvlab-research/Seg-Zero.git, cd Seg-Zero, conda create -n visionreasoner python=3.12, conda activate visionreasoner, pip install torch==2.6.0 torchvision==0.21.0, pip install -e .
Prerequisites: Python 3.12, PyTorch 2.6.0, Torchvision 0.21.0. For inference and training, Hugging Face models (Qwen2-VL, Qwen2.5-VL) and datasets (RefCOCOg-9K, VisionReasoner-MultiObjects-7K, ReasonSeg-Test, ReasonSeg-Val) are required.
Resources: Requires significant GPU memory for training, with configurable parameters for micro_batch_size_per_device_for_update, micro_batch_size_per_device_for_experience, tensor_parallel_size, gpu_memory_utilization, and n.
Links: Seg-Zero Paper, VisionReasoner Paper, HuggingFace Daily, Models, Datasets

Highlighted Details

Emergent test-time reasoning ability with generated reasoning chains.
Trained exclusively using reinforcement learning, no supervised reasoning data.
Supports Qwen2-VL and Qwen2.5-VL model series.
Implements commonly used rewards like IoU and L1 rewards.
Supports multi-object segmentation (major update in May).

Maintenance & Community

The project is actively developed, with a major update in May 2025 introducing multi-object segmentation.
Built upon EasyR1 and veRL.
Utilizes models from Qwen2-VL, Qwen2.5-VL, and SAM2.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README indicates that for Seg-Zero, the best results on different benchmarks were achieved using different checkpoints, suggesting potential variability in performance across checkpoints. It recommends evaluating all benchmarks with a single model for consistent comparison.

Seg-Zero by JIA-Lab-research

Explore Similar Projects

MM-Eureka-V0 by FanqingM

VisualThinker-R1-Zero by turningpoint-ai

Online-DPO-R1 by RLHFlow

understand-r1-zero by sail-sg

Agent-R1 by 0russwest0

X-R1 by dhcode-cpp

Visual-RFT by Liuziyu77

train-deepseek-r1 by FareedKhan-dev

simpleRL-reason by hkust-nlp

R1-V by StarsfieldAI

VLM-R1 by om-ai-lab

TinyZero by Jiayi-Pan