LangSplat by minghanqin

Language-guided 3D scene synthesis and manipulation

Created 2 years ago

998 stars

Top 37.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

Summary

LangSplat offers the official implementation for "3D Language Gaussian Splatting," enabling semantic interaction and querying of 3D scenes via natural language. It targets researchers and practitioners in computer vision, providing a novel method for language-guided scene comprehension and manipulation with efficient rendering capabilities.

How It Works

The system utilizes a PyTorch-based optimizer with CUDA extensions to generate 3D Gaussian Splatting models from SfM datasets incorporating language features. A key innovation is a scene-wise language autoencoder that drastically reduces memory demands by learning compressed language representations. Tools are provided to convert custom image datasets into an optimization-ready SfM format, complete with language features, facilitating broader application.

Quick Start & Requirements

Installation requires cloning the repository recursively (git clone --recursive) and setting up a Conda environment (conda env create --file environment.yml, conda activate langsplat). Hardware needs include a CUDA-ready GPU (Compute Capability 7.0+) and 24 GB VRAM for high-quality training. Software prerequisites are Conda, a C++ compiler, and CUDA SDK 11 (e.g., 11.8). Preprocessed datasets and pre-trained models are available via BaiduWangpan and GoogleDrive. Links to the paper, video, and webpage are provided.

Highlighted Details

Official implementation of the CVPR 2024 Highlight paper "LangSplat: 3D Language Gaussian Splatting."
LangSplat V2 demonstrates over 450+ FPS rendering performance.
Preprocessed datasets (3D-OVS) and pre-trained models are readily available.
Extends the LERF dataset with corresponding COLMAP data.
Evaluation code builds upon LERF and NerfStudio.

Maintenance & Community

The project is explicitly "under development," encouraging contributions via GitHub issues and pull requests. A TODO list indicates ongoing work, with some items marked "coming soon." No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The provided README content does not specify a software license. This omission requires clarification for assessing commercial use or integration into closed-source projects.

Limitations & Caveats

The "under development" status implies potential for ongoing changes and undiscovered issues. High-quality training demands significant hardware, specifically 24 GB VRAM. Custom scene processing requires strict adherence to dataset structure and COLMAP data availability. The absence of a stated license is a notable adoption caveat.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days