chat-with-nerf by sled-group

Chat with NeRF enables natural language interaction with NeRF models

Created 2 years ago

318 stars

Top 85.2% on SourcePulse

Project Summary

This project enables natural language interaction with Neural Radiance Fields (NeRFs), allowing users to locate and query 3D objects within a scene through dialogue. It targets researchers and developers in computer vision and robotics interested in open-vocabulary 3D understanding and human-AI interaction. The primary benefit is intuitive, conversational control and exploration of 3D environments.

How It Works

The system integrates a Large Language Model (LLM) with a LERF (Language-Embedded Radiance Fields) model. The LLM processes user queries, translating them into actionable commands or questions about the 3D scene. LERF, built upon NeRF, embeds language information into the radiance field representation, enabling the model to understand and respond to semantic queries about objects and their spatial relationships. This approach allows for open-vocabulary grounding, meaning it can identify objects not explicitly trained for.

Quick Start & Requirements

Install: Docker is the recommended installation method.
- docker build -t chat-with-nerf:latest .
- Alternatively, pull from Dockerhub: docker pull jedyang97/chat-with-nerf:latest
Prerequisites:
- NVIDIA GPU with CUDA 11.3 (required for nerfstudio dependencies).
- Python 3.8 (via Conda).
- torch==1.13.1, torchvision, functorch (with cu117 extras).
- ninja, tiny-cuda-nn, nerfstudio, lerf.
- LLaVA-13B-v0 checkpoint (download and place in pre-trained-weights/LLaVA/).
Setup: Requires downloading large model checkpoints and potentially building custom CUDA extensions. Docker setup involves mounting host directories.
Links:

Highlighted Details

Enables "Open-Vocabulary 3D Localization" via natural language dialog.
Supports interactive grounding of novel objects within NeRF scenes.
Integrates LLaVA for improved image captioning and grounding capabilities.
Provides Jupyter notebooks for reproducing paper results with pre-processed data.

Maintenance & Community

The project is associated with ICRA 2024 and references a paper "LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent". Links to related work include nerfstudio, LERF, and LLaVA.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. However, it depends on nerfstudio and LLaVA, whose licenses should be consulted for compatibility, especially for commercial use.

Limitations & Caveats

The project requires specific older versions of CUDA (11.3) and PyTorch (1.13.1), which may conflict with other deep learning projects. The setup process, particularly managing LLM checkpoints and CUDA dependencies, can be complex. The README indicates ongoing work to improve the foundation model for grounding.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days