autovfx by haoyuhsu

Video editing from natural language instructions, physically realistic

Created 1 year ago

328 stars

Top 83.5% on SourcePulse

Project Summary

AutoVFX enables physically realistic video editing using natural language instructions. It targets researchers and practitioners in computer vision and graphics who need to manipulate 3D scenes and generate novel video content. The system allows users to modify scenes by adding, removing, or transforming objects based on textual descriptions, integrating with simulation tools for realistic outcomes.

How It Works

AutoVFX leverages a pipeline that combines 3D scene reconstruction, object manipulation, and physics-based simulation. It reconstructs scenes using techniques like 3D Gaussian Splatting (3DGS) and BakedSDF for surface representation. Natural language commands are processed to identify target objects and desired actions, which are then translated into parameters for simulation engines like Blender. The system integrates various modules for segmentation (DEVA, Grounded-SAM), tracking, inpainting (LaMa), and lighting estimation (DiffusionLight) to achieve comprehensive scene editing.

Quick Start & Requirements

Installation: Clone the repository and set up a Conda environment. Install PyTorch with CUDA 11.8, Gaussian Splatting submodules, segmentation/tracking modules (DEVA, Grounded-SAM), inpainting modules (LaMa), lighting estimation modules (DiffusionLight), and other dependencies including PyTorch3D and Trimesh.
Prerequisites: Ubuntu 22.04.5 LTS, NVIDIA GeForce RTX 4090 GPU with driver version 550, CUDA 12.4, nvcc 11.8. Requires significant disk space for checkpoints and datasets.
Resources: Download pretrained checkpoints, datasets (e.g., Garden scene), and Blender 3.6.11.
Links: Project Page | Paper

Highlighted Details

Integrates 3D Gaussian Splatting (3DGS) with SuGaR for scene representation and editing.
Utilizes DEVA and Grounded-SAM for robust open-vocabulary video segmentation and object detection.
Employs BakedSDF from SDFStudio for high-quality surface reconstruction.
Supports physics-based simulation in Blender for realistic object interactions and animations.
Includes modules for normal estimation, pose extraction/alignment, and relative scene scale estimation.

Maintenance & Community

The project is associated with the University of Illinois at Urbana-Champaign. Key dependencies include widely used libraries like PyTorch, Blender, and various specialized CV/Graphics toolkits.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the README. However, it relies on and builds upon several open-source projects with their own licenses (e.g., Gaussian Splatting, SDFStudio, PyTorch3D). Users should verify compatibility for commercial use.

Limitations & Caveats

The setup process is complex, involving numerous dependencies and manual downloads. The code has been tested on specific hardware and software versions, potentially limiting compatibility. Some functionalities, like a local Gradio demo, are still under development.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days