physgen  by stevenlsw

Image-to-video generation pipeline using physics simulation and video diffusion

Created 1 year ago
308 stars

Top 87.1% on SourcePulse

GitHubView on GitHub
Project Summary

PhysGen offers a training-free pipeline for generating videos from single images by integrating rigid-body physics simulation with generative video diffusion models. It targets researchers and practitioners in computer vision and graphics seeking to create realistic, physically plausible video content from static inputs. The primary benefit is the ability to generate dynamic videos that adhere to physical laws without requiring extensive training data or fine-tuning.

How It Works

PhysGen employs a multi-stage approach: perception, simulation, and rendering. First, a perception module extracts scene properties like segmentation masks, depth, normals, and albedo from the input image. This information then feeds into a physics simulator (Pymunk) that models object interactions based on user-defined physical properties and initial conditions. Finally, the simulated motion is rendered using a combination of relighting and a video diffusion model (SEINE) to produce the final video. This modular design allows for fine-grained control over physical dynamics and visual style.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies using Conda and pip:
    git clone --recurse-submodules https://github.com/stevenlsw/physgen.git
    cd physgen
    conda create -n physgen python=3.9
    conda activate physgen
    pip install -r requirements.txt
    
  • Prerequisites: Python 3.9, PyTorch, Conda. GPU is recommended for rendering stages.
  • Demo: A Colab notebook is available for quick experimentation. A quick demo can be run locally:
    export PYTHONPATH=$(pwd)
    python simulation/animate.py --data_root data --save_root outputs --config data/${name}/sim.yaml
    
  • Docs: Colab Notebook

Highlighted Details

  • Integrates rigid-body physics simulation with diffusion models for training-free video generation.
  • Perception module includes segmentation, depth/normal estimation, and inpainting.
  • Supports custom scene configuration via YAML files for physics properties.
  • Rendering pipeline includes relighting and SEINE video diffusion.

Maintenance & Community

The project is associated with ECCV 2024 and lists authors from institutions like UT Austin and UC San Diego. Key dependencies include Pymunk for simulation and SEINE for diffusion.

Licensing & Compatibility

The repository is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The perception pipeline is designed primarily for side-view and top-down images; custom images may require manual adjustment of the pipeline. The SEINE model for video diffusion rendering needs to be downloaded separately.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

SkyReels-V2 by SkyworkAI

3.3%
4k
Film generation model for infinite-length videos using diffusion forcing
Created 5 months ago
Updated 1 month ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Rodrigo Nader Rodrigo Nader(Cofounder of Langflow), and
1 more.

DiffSynth-Studio by modelscope

0.9%
10k
Open-source project for diffusion model exploration
Created 1 year ago
Updated 18 hours ago
Feedback? Help us improve.