sam2  by facebookresearch

Foundation model for promptable visual segmentation in images and videos

created 1 year ago
16,388 stars

Top 2.9% on sourcepulse

GitHubView on GitHub
Project Summary

SAM 2 is a foundation model for promptable visual segmentation in both images and videos, extending the capabilities of its predecessor to handle temporal data. It is designed for researchers and developers working on advanced computer vision tasks, offering a powerful tool for precise object segmentation and tracking across static and dynamic visual content.

How It Works

SAM 2 employs a transformer architecture enhanced with a streaming memory mechanism, enabling efficient real-time video processing. This design allows the model to maintain context across frames, crucial for video segmentation. The project also highlights a model-in-the-loop data engine used to create the SA-V dataset, the largest video segmentation dataset to date, which underpins SAM 2's strong performance.

Quick Start & Requirements

  • Installation: git clone https://github.com/facebookresearch/sam2.git && cd sam2 && pip install -e .
  • Prerequisites: Python >= 3.10, PyTorch >= 2.5.1, TorchVision >= 0.20.1, CUDA toolkit matching PyTorch version, nvcc compiler. For notebooks: pip install -e ".[notebooks]".
  • Setup: Requires cloning the repository and installing Python dependencies, including a custom CUDA kernel compilation.
  • Resources: Checkpoints range from 38.9M to 224.4M. Inference speed varies by model size, with the large model achieving 39.5 FPS on an A100.
  • Links: Paper, Project, Demo, Blog

Highlighted Details

  • Supports both image and video segmentation with a unified architecture.
  • Offers optimized video processing with torch.compile for significant speedups.
  • Includes APIs for interactive prompting, refinement, and multi-object tracking.
  • Provides improved SAM 2.1 checkpoints with enhanced performance metrics on benchmarks like SA-V, MOSE, and LVOS.
  • Training and fine-tuning code are available for custom dataset utilization.

Maintenance & Community

The project is actively maintained by Meta AI (FAIR) with numerous contributors listed. Updates include new checkpoints, training code, and performance optimizations. A web demo is also available.

Licensing & Compatibility

Licensed under Apache 2.0 for model checkpoints, demo code, and training code. The Inter Font and Noto Color Emoji used in the demo are under SIL Open Font License, version 1.1. This license is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

While installation is generally straightforward, users may encounter CUDA extension compilation issues, which can sometimes be ignored with limited impact on core functionality. The project relies on specific PyTorch and CUDA versions, requiring careful environment management.

Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
13
Star History
1,223 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.