physgen by stevenlsw

Image-to-video generation pipeline using physics simulation and video diffusion

Created 1 year ago

328 stars

Top 83.5% on SourcePulse

Project Summary

PhysGen offers a training-free pipeline for generating videos from single images by integrating rigid-body physics simulation with generative video diffusion models. It targets researchers and practitioners in computer vision and graphics seeking to create realistic, physically plausible video content from static inputs. The primary benefit is the ability to generate dynamic videos that adhere to physical laws without requiring extensive training data or fine-tuning.

How It Works

PhysGen employs a multi-stage approach: perception, simulation, and rendering. First, a perception module extracts scene properties like segmentation masks, depth, normals, and albedo from the input image. This information then feeds into a physics simulator (Pymunk) that models object interactions based on user-defined physical properties and initial conditions. Finally, the simulated motion is rendered using a combination of relighting and a video diffusion model (SEINE) to produce the final video. This modular design allows for fine-grained control over physical dynamics and visual style.

Quick Start & Requirements

Install: Clone the repository and install dependencies using Conda and pip:

git clone --recurse-submodules https://github.com/stevenlsw/physgen.git
cd physgen
conda create -n physgen python=3.9
conda activate physgen
pip install -r requirements.txt

Prerequisites: Python 3.9, PyTorch, Conda. GPU is recommended for rendering stages.

Demo: A Colab notebook is available for quick experimentation. A quick demo can be run locally:

export PYTHONPATH=$(pwd)
python simulation/animate.py --data_root data --save_root outputs --config data/${name}/sim.yaml

Docs: Colab Notebook

Highlighted Details

Integrates rigid-body physics simulation with diffusion models for training-free video generation.
Perception module includes segmentation, depth/normal estimation, and inpainting.
Supports custom scene configuration via YAML files for physics properties.
Rendering pipeline includes relighting and SEINE video diffusion.

Maintenance & Community

The project is associated with ECCV 2024 and lists authors from institutions like UT Austin and UC San Diego. Key dependencies include Pymunk for simulation and SEINE for diffusion.

Licensing & Compatibility

The repository is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The perception pipeline is designed primarily for side-view and top-down images; custom images may require manual adjustment of the pipeline. The SEINE model for video diffusion rendering needs to be downloaded separately.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days