Part-level segmentation with language prompts, extending SAM
Top 72.2% on sourcepulse
This repository extends the Segment Anything Model (SAM) to enable segmentation based on text prompts, supporting both object-level (e.g., "dog") and fine-grained part-level (e.g., "dog head") queries. It also integrates with a Visual ChatGPT-based dialogue system for natural language interaction with various segmentation models, offering a more intuitive and flexible approach to image editing and analysis.
How It Works
The project leverages a combination of foundation models: Segment Anything (SAM) for general segmentation, GLIP or VLPart for grounded language-image understanding, and Visual ChatGPT to orchestrate interactions. GLIP/VLPart provides open-vocabulary object detection capabilities, mapping text prompts to image regions, which are then fed into SAM for precise mask generation. VLPart specifically focuses on denser, open-vocabulary part segmentation.
Quick Start & Requirements
swinbase_part_0a0000.pth
, sam_vit_h_4b8939.pth
, glip_large.pth
) need to be downloaded.python demo_vlpart_sam.py --input_image assets/twodogs.jpeg --output_dir outputs_demo --text_prompt "dog head"
python demo_glip_sam.py --input_image assets/demo2.jpeg --output_dir outputs_demo --text_prompt "frog"
python chatbot.py --load "ImageCaptioning_cuda:0, SegmentAnything_cuda:1, PartPromptSegmentAnything_cuda:1, ObjectPromptSegmentAnything_cuda:0"
(requires export OPENAI_API_KEY={Your_Private_Openai_Key}
)Highlighted Details
Maintenance & Community
The project acknowledges significant contributions from the Segment Anything, EditAnything, CLIP, GLIP, Grounded-Segment-Anything, and Visual ChatGPT projects. Citation information is provided for research use.
Licensing & Compatibility
Licensed under CC-BY-NC 4.0. This license restricts commercial use and redistribution.
Limitations & Caveats
The CC-BY-NC 4.0 license prohibits commercial use, which may limit adoption in commercial products or services. The project relies on external models and may inherit their limitations or require specific versions for compatibility.
2 years ago
1 day