nanosam by NVIDIA-AI-IOT

Real-time segmentation model for NVIDIA Jetson

Created 2 years ago

845 stars

Top 42.2% on SourcePulse

Project Summary

NanoSAM is a highly optimized variant of the Segment Anything (SAM) model, designed for real-time image segmentation on NVIDIA Jetson platforms. It targets developers and researchers working with edge AI applications, offering significantly reduced latency and resource requirements compared to larger models.

How It Works

NanoSAM achieves its performance by distilling a smaller MobileSAM model (specifically, the ResNet18 variant of the image encoder) using unlabeled images. This knowledge distillation process transfers capabilities from a larger teacher model to a more compact student model. The resulting NanoSAM model is then optimized for NVIDIA TensorRT, enabling efficient execution on edge devices.

Quick Start & Requirements

Install: Clone the repository and run python3 setup.py develop --user.
Prerequisites: PyTorch, torch2trt, NVIDIA TensorRT (optional but recommended for engine building), transformers (for OWL-ViT example), trt_pose (for pose example).
Setup: Requires downloading pre-trained checkpoints and building TensorRT engines for the image encoder and mask decoder using trtexec.
Resources: Targeted for NVIDIA Jetson Orin platforms. Performance benchmarks are provided for Jetson Orin Nano and AGX Orin.
Docs: NVIDIA-AI-IOT/nanosam

Highlighted Details

Real-time performance on Jetson Orin Nano (27ms for image encoder, 4.2ms for full pipeline).
Achieves 0.706 mIoU accuracy with the ResNet18 image encoder.
Supports segmentation via points, bounding boxes, and keypoints (with TRTPose integration).
Includes examples for object detection with OWL-ViT and segmentation tracking.

Maintenance & Community

The project is maintained by NVIDIA AI IoT. Links to relevant NVIDIA Jetson resources are provided.

Licensing & Compatibility

The repository does not explicitly state a license. However, it is built upon SAM and MobileSAM, which have permissive licenses. Compatibility for commercial use is not specified.

Limitations & Caveats

The MobileSAM image encoder requires FP32 precision in TensorRT due to erroneous results with FP16. The tracking example is experimental and may not be robust.

nanosam by NVIDIA-AI-IOT

Explore Similar Projects

GPT4Scene-and-VLN-R1 by Qi-Zhangyang

tensorRT_Pro-YOLOv8 by Melody-Zhou

VideoLLaMA3 by DAMO-NLP-SG

awesome-yolo-object-detection by coderonion

sdnext by vladmandic

minimind-v by jingyaogong

X-AnyLabeling by CVHub520

MNN by alibaba

ImageAI by OlafenwaMoses

sam2 by facebookresearch

jetson-inference by dusty-nv

pytorch-image-models by huggingface