sam2 by facebookresearch

Foundation model for promptable visual segmentation in images and videos

Created 1 year ago

18,253 stars

Top 2.5% on SourcePulse

View on GitHub

9 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Omar Sanseviero

DevRel at Google DeepMind

Piotr Dollar

Research Director at Meta

Anastasios Angelopoulos

Cofounder of LMArena

and 5 more!

Project Summary

SAM 2 is a foundation model for promptable visual segmentation in both images and videos, extending the capabilities of its predecessor to handle temporal data. It is designed for researchers and developers working on advanced computer vision tasks, offering a powerful tool for precise object segmentation and tracking across static and dynamic visual content.

How It Works

SAM 2 employs a transformer architecture enhanced with a streaming memory mechanism, enabling efficient real-time video processing. This design allows the model to maintain context across frames, crucial for video segmentation. The project also highlights a model-in-the-loop data engine used to create the SA-V dataset, the largest video segmentation dataset to date, which underpins SAM 2's strong performance.

Quick Start & Requirements

Installation: git clone https://github.com/facebookresearch/sam2.git && cd sam2 && pip install -e .
Prerequisites: Python >= 3.10, PyTorch >= 2.5.1, TorchVision >= 0.20.1, CUDA toolkit matching PyTorch version, nvcc compiler. For notebooks: pip install -e ".[notebooks]".
Setup: Requires cloning the repository and installing Python dependencies, including a custom CUDA kernel compilation.
Resources: Checkpoints range from 38.9M to 224.4M. Inference speed varies by model size, with the large model achieving 39.5 FPS on an A100.
Links: Paper, Project, Demo, Blog

Highlighted Details

Supports both image and video segmentation with a unified architecture.
Offers optimized video processing with torch.compile for significant speedups.
Includes APIs for interactive prompting, refinement, and multi-object tracking.
Provides improved SAM 2.1 checkpoints with enhanced performance metrics on benchmarks like SA-V, MOSE, and LVOS.
Training and fine-tuning code are available for custom dataset utilization.

Maintenance & Community

The project is actively maintained by Meta AI (FAIR) with numerous contributors listed. Updates include new checkpoints, training code, and performance optimizations. A web demo is also available.

Licensing & Compatibility

Licensed under Apache 2.0 for model checkpoints, demo code, and training code. The Inter Font and Noto Color Emoji used in the demo are under SIL Open Font License, version 1.1. This license is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

While installation is generally straightforward, users may encounter CUDA extension compilation issues, which can sometimes be ignored with limited impact on core functionality. The project relies on specific PyTorch and CUDA versions, requiring careful environment management.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

322 stars in the last 30 days