Video editing from natural language instructions, physically realistic
Top 86.0% on sourcepulse
AutoVFX enables physically realistic video editing using natural language instructions. It targets researchers and practitioners in computer vision and graphics who need to manipulate 3D scenes and generate novel video content. The system allows users to modify scenes by adding, removing, or transforming objects based on textual descriptions, integrating with simulation tools for realistic outcomes.
How It Works
AutoVFX leverages a pipeline that combines 3D scene reconstruction, object manipulation, and physics-based simulation. It reconstructs scenes using techniques like 3D Gaussian Splatting (3DGS) and BakedSDF for surface representation. Natural language commands are processed to identify target objects and desired actions, which are then translated into parameters for simulation engines like Blender. The system integrates various modules for segmentation (DEVA, Grounded-SAM), tracking, inpainting (LaMa), and lighting estimation (DiffusionLight) to achieve comprehensive scene editing.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is associated with the University of Illinois at Urbana-Champaign. Key dependencies include widely used libraries like PyTorch, Blender, and various specialized CV/Graphics toolkits.
Licensing & Compatibility
The repository's licensing is not explicitly stated in the README. However, it relies on and builds upon several open-source projects with their own licenses (e.g., Gaussian Splatting, SDFStudio, PyTorch3D). Users should verify compatibility for commercial use.
Limitations & Caveats
The setup process is complex, involving numerous dependencies and manual downloads. The code has been tested on specific hardware and software versions, potentially limiting compatibility. Some functionalities, like a local Gradio demo, are still under development.
3 months ago
Inactive