Chat with NeRF enables natural language interaction with NeRF models
Top 87.1% on sourcepulse
This project enables natural language interaction with Neural Radiance Fields (NeRFs), allowing users to locate and query 3D objects within a scene through dialogue. It targets researchers and developers in computer vision and robotics interested in open-vocabulary 3D understanding and human-AI interaction. The primary benefit is intuitive, conversational control and exploration of 3D environments.
How It Works
The system integrates a Large Language Model (LLM) with a LERF (Language-Embedded Radiance Fields) model. The LLM processes user queries, translating them into actionable commands or questions about the 3D scene. LERF, built upon NeRF, embeds language information into the radiance field representation, enabling the model to understand and respond to semantic queries about objects and their spatial relationships. This approach allows for open-vocabulary grounding, meaning it can identify objects not explicitly trained for.
Quick Start & Requirements
docker build -t chat-with-nerf:latest .
docker pull jedyang97/chat-with-nerf:latest
nerfstudio
dependencies).torch==1.13.1
, torchvision
, functorch
(with cu117
extras).ninja
, tiny-cuda-nn
, nerfstudio
, lerf
.pre-trained-weights/LLaVA/
).Highlighted Details
Maintenance & Community
The project is associated with ICRA 2024 and references a paper "LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent". Links to related work include nerfstudio
, LERF
, and LLaVA
.
Licensing & Compatibility
The repository itself does not explicitly state a license in the README. However, it depends on nerfstudio
and LLaVA
, whose licenses should be consulted for compatibility, especially for commercial use.
Limitations & Caveats
The project requires specific older versions of CUDA (11.3) and PyTorch (1.13.1), which may conflict with other deep learning projects. The setup process, particularly managing LLM checkpoints and CUDA dependencies, can be complex. The README indicates ongoing work to improve the foundation model for grounding.
1 year ago
1 week