nanoowl by NVIDIA-AI-IOT

OWL-ViT optimization project for real-time object detection

created 1 year ago

352 stars

Top 80.3% on sourcepulse

Project Summary

NanoOWL optimizes the OWL-ViT model for real-time inference on NVIDIA Jetson Orin platforms using NVIDIA TensorRT. It enables zero-shot object detection and classification via text prompts, and introduces a novel "tree detection" pipeline for nested detection and classification, targeting developers and researchers working with edge AI and computer vision on NVIDIA hardware.

How It Works

NanoOWL leverages NVIDIA TensorRT to optimize OWL-ViT for efficient execution on Jetson Orin devices. This optimization involves converting the model to a TensorRT engine, which significantly accelerates inference. The "tree detection" pipeline extends OWL-ViT's capabilities by combining it with CLIP, allowing for hierarchical and nested detection based on complex text descriptions, offering a flexible approach to open-vocabulary recognition.

Quick Start & Requirements

Install dependencies: PyTorch, torch2trt, Transformers, TensorRT.
Build TensorRT engine: python3 -m nanoowl.build_image_encoder_engine data/owl_image_encoder_patch32.engine
Run example: python3 examples/owl_predict.py --prompt="[an owl, a glove]" --threshold=0.1 --image_encoder_engine=../data/owl_image_encoder_patch32.engine
Requires NVIDIA Jetson Orin platform for optimal performance.
Official examples and setup instructions are available in the repository.

Highlighted Details

Real-time inference on Jetson Orin Nano and AGX Orin platforms.
Supports nested detection and classification via text prompts using a "tree detection" pipeline.
Can be combined with NanoSAM for zero-shot instance segmentation.
Includes examples for basic prediction, tree prediction, and live camera feed demonstration.

Maintenance & Community

Project is hosted on GitHub by NVIDIA-AI-IOT.
Related projects include NanoSAM, Jetson Introduction to Knowledge Distillation Tutorial, Jetson Generative AI Playground, and Jetson Containers.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Performance benchmarks (FPS) for the OWL-ViT (ViT-B/32) model on Jetson Orin Nano are listed as "TBD".
The project focuses specifically on NVIDIA Jetson Orin platforms, limiting its applicability to other hardware.

Health Check

Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

1

Issues (30d)

1

Star History

26 stars in the last 90 days

Explore Similar Projects

OV-DINO by wanghao9610

Research paper for open-vocabulary object detection

created 1 year ago

updated 4 months ago

Vitron by SkyworkAI

Vision LLM research paper for pixel-level understanding, generation, segmentation & editing

created 1 year ago

updated 9 months ago

NanoLLM by dusty-nv

Optimized local inference for LLMs with HuggingFace-like APIs

created 1 year ago

updated 9 months ago

nanosam by NVIDIA-AI-IOT

Real-time segmentation model for NVIDIA Jetson

created 1 year ago

updated 1 year ago

bonnet by PRBonn

Open-source framework for robotic semantic segmentation training/deployment

created 7 years ago

updated 2 years ago

yolo_research by positive666

YOLO research and improvement project

created 4 years ago

updated 3 months ago

Starred by

Amanpreet Singh

Amanpreet Singh(Cofounder of Contextual AI),

Lewis Tunstall

Lewis Tunstall(Researcher at Hugging Face), and

2 more.

transformer-deploy by ELS-RD

CLI tool for optimized Hugging Face Transformer deployment

created 3 years ago

updated 9 months ago

yolov4-tiny-pytorch by bubbliiiing

PyTorch code for YOLOv4-tiny object detection

created 5 years ago

updated 1 year ago

FastSAM by CASIA-IVA-Lab

CNN for fast image segmentation, comparable to SAM but faster

created 2 years ago

updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

GroundingDINO by IDEA-Research

Object detection via grounded pre-training research paper

created 2 years ago

updated 11 months ago

Starred by

Chip Huyen

Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and

Soumith Chintala

Soumith Chintala(Author of PyTorch).

jetson-inference by dusty-nv

Vision DNN library for NVIDIA Jetson devices

created 9 years ago

updated 9 months ago

Starred by

Soumith Chintala

Soumith Chintala(Author of PyTorch).

yolov3 by ultralytics

Object detection in PyTorch > ONNX > CoreML > TFLite

created 7 years ago

updated 2 hours ago

Feedback? Help us improve.