Video object tracker using open-world models
Top 18.9% on sourcepulse
Grounded SAM 2 provides a pipeline for grounding and tracking any object in videos using state-of-the-art open-world models like Grounding DINO, Florence-2, and SAM 2. It is designed for researchers and developers working on advanced video analysis, object detection, and segmentation tasks, offering simplified implementations for complex visual tasks.
How It Works
This project builds upon the concept of assembling open-world models, similar to its predecessor Grounded SAM. It leverages the capabilities of models like Grounding DINO (including versions 1.5, 1.6, and DINO-X) for object detection and Florence-2 for various vision tasks, all integrated with SAM 2 for segmentation and tracking. This modular approach allows for flexible and powerful visual task execution, with a focus on simplifying the user experience.
Quick Start & Requirements
pip install -e .
for Grounded SAM 2, pip install --no-build-isolation -e grounding_dino
for Grounding DINO. Docker installation is also supported via make build-image
and make run
.pip install dds-cloudapi-sdk --upgrade
and an API token are required.Highlighted Details
Maintenance & Community
The project is actively updated, with recent changes including API updates for Grounding DINO 1.5/1.6 and DINO-X, and support for SAM-2.1.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README. However, it cites research papers that may have their own licensing terms. Compatibility for commercial use is not specified.
Limitations & Caveats
The "Continuous ID" tracking feature is noted as still under development and not entirely stable. Some models (Grounding DINO 1.5/1.6, DINO-X) require an API token from the official website.
2 months ago
1 day