nanosam  by NVIDIA-AI-IOT

Real-time segmentation model for NVIDIA Jetson

created 1 year ago
777 stars

Top 45.9% on sourcepulse

GitHubView on GitHub
Project Summary

NanoSAM is a highly optimized variant of the Segment Anything (SAM) model, designed for real-time image segmentation on NVIDIA Jetson platforms. It targets developers and researchers working with edge AI applications, offering significantly reduced latency and resource requirements compared to larger models.

How It Works

NanoSAM achieves its performance by distilling a smaller MobileSAM model (specifically, the ResNet18 variant of the image encoder) using unlabeled images. This knowledge distillation process transfers capabilities from a larger teacher model to a more compact student model. The resulting NanoSAM model is then optimized for NVIDIA TensorRT, enabling efficient execution on edge devices.

Quick Start & Requirements

  • Install: Clone the repository and run python3 setup.py develop --user.
  • Prerequisites: PyTorch, torch2trt, NVIDIA TensorRT (optional but recommended for engine building), transformers (for OWL-ViT example), trt_pose (for pose example).
  • Setup: Requires downloading pre-trained checkpoints and building TensorRT engines for the image encoder and mask decoder using trtexec.
  • Resources: Targeted for NVIDIA Jetson Orin platforms. Performance benchmarks are provided for Jetson Orin Nano and AGX Orin.
  • Docs: NVIDIA-AI-IOT/nanosam

Highlighted Details

  • Real-time performance on Jetson Orin Nano (27ms for image encoder, 4.2ms for full pipeline).
  • Achieves 0.706 mIoU accuracy with the ResNet18 image encoder.
  • Supports segmentation via points, bounding boxes, and keypoints (with TRTPose integration).
  • Includes examples for object detection with OWL-ViT and segmentation tracking.

Maintenance & Community

The project is maintained by NVIDIA AI IoT. Links to relevant NVIDIA Jetson resources are provided.

Licensing & Compatibility

The repository does not explicitly state a license. However, it is built upon SAM and MobileSAM, which have permissive licenses. Compatibility for commercial use is not specified.

Limitations & Caveats

The MobileSAM image encoder requires FP32 precision in TensorRT due to erroneous results with FP16. The tracking example is experimental and may not be robust.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
28 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.