Grounded-SAM-2 by IDEA-Research

Video object tracker using open-world models

Created 1 year ago

3,180 stars

Top 14.9% on SourcePulse

Project Summary

Grounded SAM 2 provides a pipeline for grounding and tracking any object in videos using state-of-the-art open-world models like Grounding DINO, Florence-2, and SAM 2. It is designed for researchers and developers working on advanced video analysis, object detection, and segmentation tasks, offering simplified implementations for complex visual tasks.

How It Works

This project builds upon the concept of assembling open-world models, similar to its predecessor Grounded SAM. It leverages the capabilities of models like Grounding DINO (including versions 1.5, 1.6, and DINO-X) for object detection and Florence-2 for various vision tasks, all integrated with SAM 2 for segmentation and tracking. This modular approach allows for flexible and powerful visual task execution, with a focus on simplifying the user experience.

Quick Start & Requirements

Installation: pip install -e . for Grounded SAM 2, pip install --no-build-isolation -e grounding_dino for Grounding DINO. Docker installation is also supported via make build-image and make run.
Prerequisites: Python 3.10, PyTorch >= 2.3.1, torchvision >= 0.18.1, CUDA-12.1. For Grounding DINO 1.5/1.6 and DINO-X, pip install dds-cloudapi-sdk --upgrade and an API token are required.
Setup: Requires downloading pretrained checkpoints for SAM 2 and Grounding DINO.
Demos: Numerous demos are available for image and video tasks, including HuggingFace and local model inference. Grounded SAM 2 Demos

Highlighted Details

Supports Grounding DINO 1.5/1.6 and DINO-X for enhanced open-set detection.
Integrates Florence-2 for diverse tasks like dense region captioning and auto-labeling.
Offers SAHI (Slicing Aided Hyper Inference) for high-resolution images with dense objects.
Provides robust video object tracking with various prompt types (point, box, mask) and custom video inputs.

Maintenance & Community

The project is actively updated, with recent changes including API updates for Grounding DINO 1.5/1.6 and DINO-X, and support for SAM-2.1.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. However, it cites research papers that may have their own licensing terms. Compatibility for commercial use is not specified.

Limitations & Caveats

The "Continuous ID" tracking feature is noted as still under development and not entirely stable. Some models (Grounding DINO 1.5/1.6, DINO-X) require an API token from the official website.

Grounded-SAM-2 by IDEA-Research

Explore Similar Projects

ml-cubifyanything by apple

Video-LLaVA by mbzuai-oryx

PixelRefer by alibaba-damo-academy

machina by PsyChip

awesome-camouflaged-object-detection by visionxiang

Awesome-Multimodal-Object-Tracking by 983632847

ComfyUI-YoloWorld-EfficientSAM by ZHO-ZHO-ZHO

Single_Object_Tracking_Paper_List by wangxiao5791509

VideoPipe by sherlockchou86

X-AnyLabeling by CVHub520

notebooks by roboflow

sam2 by facebookresearch