clipseg  by timojl

Image segmentation via text/image prompts (CVPR 2022 paper)

created 3 years ago
1,263 stars

Top 32.0% on sourcepulse

GitHubView on GitHub
Project Summary

CLIPSeg enables zero-shot image segmentation using natural language or image-based prompts, targeting researchers and developers in computer vision. It allows for rapid creation of segmentation models without explicit training, offering flexibility for diverse segmentation tasks.

How It Works

CLIPSeg leverages the CLIP model's multimodal understanding to bridge the gap between text/image prompts and pixel-level segmentation masks. It employs a transformer-based decoder (CLIPDensePredT or ViTDensePredT) that takes CLIP embeddings and image features to generate dense predictions, effectively translating semantic concepts into spatial masks. This approach avoids the need for task-specific training data and model fine-tuning.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/openai/CLIP.git
  • Requires PyTorch, Torchvision, and CLIP.
  • Download pre-trained weights (rd64-uni.pth or rd64-uni-refined.pth).
  • Official quickstart notebook available: Quickstart.ipynb
  • Interactive demo via MyBinder (CPU-bound, slower inference).

Highlighted Details

  • Integrated into HuggingFace Transformers library.
  • Offers both standard and fine-grained prediction weights (rd64-uni-refined.pth).
  • Supports multiple datasets including PhraseCut, PFENet, PascalZeroShot, and COCO.
  • Includes wrappers for third-party models like PFENet.

Maintenance & Community

  • Project associated with CVPR 2022 paper.
  • Active integration into HuggingFace Transformers.

Licensing & Compatibility

  • Source code released under MIT License.
  • Model weights are not covered by the MIT license; specific terms are not detailed in the README.

Limitations & Caveats

The README does not specify the license terms for the model weights, which may impact commercial use. The MyBinder demo runs on CPU, leading to slower inference times compared to GPU usage.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
45 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.