WildDet3D by allenai

Promptable 3D detection for real-world scenarios

Created 3 months ago

591 stars

Top 54.3% on SourcePulse

Project Summary

Summary

WildDet3D addresses the challenge of scaling promptable 3D object detection in diverse, real-world environments. It enables flexible, zero-shot detection using text, box, or point prompts, benefiting researchers and engineers in fields like robotics and AR/VR by providing adaptable 3D perception capabilities.

How It Works

The system employs a SAM3 backbone for segmentation and LingBot-Depth for monocular depth estimation, facilitating promptable 3D detection. It uniquely supports text, 2D box (geometric/exemplar), and point prompts, allowing for flexible querying of 3D scenes. This approach enables robust zero-shot transfer across varied datasets, including outdoor driving and indoor scenes.

Quick Start & Requirements

Installation involves cloning the repository with submodules, creating a Conda environment (Python 3.11), and installing specific versions of PyTorch (CUDA 12.1), vis4d, and its CUDA ops, followed by other dependencies. Key prerequisites include CUDA-enabled GPUs. Training requires 8 GPUs. Inference can achieve up to a 3.0x speedup using BF16 autocast and torch.compile, though initial compilation may take ~17 minutes.

Highlighted Details

Promptable 3D Detection: Offers versatile detection via text, 2D box (geometric/exemplar), and point prompts.
Zero-Shot Transfer: Demonstrates strong performance on unseen datasets like Argoverse 2 and ScanNet.
Real-World Integration: Powers real-time applications, including an iPhone app and integration into Meta FAIR's Boxer indoor labeling pipeline.
Optimized Inference: Achieves significant speedups (up to 3.0x) on high-end GPUs using BF16 autocast and torch.compile.

Maintenance & Community

The project shows recent activity with updates in May 2026, including new evaluation configurations and integration demos. It is a collaborative effort involving researchers from Allen Institute for AI and the University of Washington. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The codebase and models are licensed under the "SAM License" and are explicitly intended for research and educational use, with adherence to Ai2's Responsible Use Guidelines. This license may restrict commercial applications.

Limitations & Caveats

The primary limitation is the restrictive "SAM License," limiting usage to research and education. Certain torch.compile optimization modes are unsupported due to dynamic shape requirements in the detection head. Installation requires careful management of specific library versions and building CUDA extensions from source.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days