OWL-ViT optimization project for real-time object detection
Top 80.3% on sourcepulse
NanoOWL optimizes the OWL-ViT model for real-time inference on NVIDIA Jetson Orin platforms using NVIDIA TensorRT. It enables zero-shot object detection and classification via text prompts, and introduces a novel "tree detection" pipeline for nested detection and classification, targeting developers and researchers working with edge AI and computer vision on NVIDIA hardware.
How It Works
NanoOWL leverages NVIDIA TensorRT to optimize OWL-ViT for efficient execution on Jetson Orin devices. This optimization involves converting the model to a TensorRT engine, which significantly accelerates inference. The "tree detection" pipeline extends OWL-ViT's capabilities by combining it with CLIP, allowing for hierarchical and nested detection based on complex text descriptions, offering a flexible approach to open-vocabulary recognition.
Quick Start & Requirements
python3 -m nanoowl.build_image_encoder_engine data/owl_image_encoder_patch32.engine
python3 examples/owl_predict.py --prompt="[an owl, a glove]" --threshold=0.1 --image_encoder_engine=../data/owl_image_encoder_patch32.engine
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
5 months ago
1 day