CLIP-SAM  by maxi-w

Open-vocabulary image segmentation via CLIP and SAM

created 2 years ago
375 stars

Top 76.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project explores combining OpenAI's CLIP and Meta's Segment Anything Model (SAM) for open-vocabulary image segmentation. It targets researchers and developers interested in zero-shot image segmentation guided by natural language prompts. The primary benefit is enabling segmentation of arbitrary objects based on text descriptions without prior training.

How It Works

The approach leverages SAM to generate a comprehensive set of masks for all potential image segments. CLIP is then used to evaluate these masks against a given text prompt, identifying the segment that best matches the description. This two-stage process allows for flexible and precise segmentation based on semantic understanding.

Quick Start & Requirements

  • Install dependencies: pip install torch opencv-python Pillow git+https://github.com/openai/CLIP.git git+https://github.com/facebookresearch/segment-anything.git
  • Download weights and place them in the repository root.
  • Run main.ipynb.
  • Requires PyTorch, OpenCV, Pillow, CLIP, and SAM.

Highlighted Details

  • Demonstrates open-vocabulary image segmentation by combining CLIP and SAM.
  • Utilizes SAM for mask generation and CLIP for prompt-based filtering.
  • Provides an example notebook (main.ipynb) for usage.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The project itself does not specify a license. It relies on CLIP (MIT License) and SAM (Apache 2.0 License). Compatibility for commercial use depends on the licensing of the underlying models.

Limitations & Caveats

This is described as a "small experiment," suggesting it may be a proof-of-concept rather than a production-ready tool. The README does not detail performance benchmarks, specific hardware requirements (e.g., GPU), or potential limitations in segmentation accuracy or prompt understanding.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.