Seg-Zero  by dvlab-research

Research paper implementation for reasoning-guided segmentation via cognitive reinforcement

created 4 months ago
477 stars

Top 64.9% on sourcepulse

GitHubView on GitHub
Project Summary

Seg-Zero is an open-source project that enables visual reasoning for image segmentation tasks, allowing models to generate a chain of thought before producing segmentation masks. It is designed for researchers and developers working on advanced computer vision and multimodal AI, offering emergent test-time reasoning capabilities without explicit supervised reasoning data.

How It Works

Seg-Zero employs a decoupled architecture comprising a reasoning model and a segmentation model. It utilizes a sophisticated reward mechanism that integrates both format and accuracy rewards, trained exclusively via reinforcement learning (GRPO). This approach allows the model to learn reasoning processes implicitly, leading to superior performance on both in-domain and out-of-domain data compared to supervised fine-tuning.

Quick Start & Requirements

  • Install: git clone https://github.com/dvlab-research/Seg-Zero.git, cd Seg-Zero, conda create -n visionreasoner python=3.12, conda activate visionreasoner, pip install torch==2.6.0 torchvision==0.21.0, pip install -e .
  • Prerequisites: Python 3.12, PyTorch 2.6.0, Torchvision 0.21.0. For inference and training, Hugging Face models (Qwen2-VL, Qwen2.5-VL) and datasets (RefCOCOg-9K, VisionReasoner-MultiObjects-7K, ReasonSeg-Test, ReasonSeg-Val) are required.
  • Resources: Requires significant GPU memory for training, with configurable parameters for micro_batch_size_per_device_for_update, micro_batch_size_per_device_for_experience, tensor_parallel_size, gpu_memory_utilization, and n.
  • Links: Seg-Zero Paper, VisionReasoner Paper, HuggingFace Daily, Models, Datasets

Highlighted Details

  • Emergent test-time reasoning ability with generated reasoning chains.
  • Trained exclusively using reinforcement learning, no supervised reasoning data.
  • Supports Qwen2-VL and Qwen2.5-VL model series.
  • Implements commonly used rewards like IoU and L1 rewards.
  • Supports multi-object segmentation (major update in May).

Maintenance & Community

  • The project is actively developed, with a major update in May 2025 introducing multi-object segmentation.
  • Built upon EasyR1 and veRL.
  • Utilizes models from Qwen2-VL, Qwen2.5-VL, and SAM2.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README indicates that for Seg-Zero, the best results on different benchmarks were achieved using different checkpoints, suggesting potential variability in performance across checkpoints. It recommends evaluating all benchmarks with a single model for consistent comparison.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
18
Star History
150 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

TinyZero by Jiayi-Pan

0.2%
12k
Minimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks
created 6 months ago
updated 3 months ago
Feedback? Help us improve.