WonderJourney by KovenYu

AI research paper for generating scene videos with camera movement

Created 2 years ago

773 stars

Top 45.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Noah Snavely

Research Scientist at Google DeepMind; Professor at Cornell Tech

Project Summary

WonderJourney enables users to generate novel 3D scenes and dynamic camera paths from single images. It targets researchers and artists interested in generative 3D content creation, offering a way to explore and animate scenes beyond their initial input.

How It Works

The system leverages a multi-stage process. It begins by estimating depth from input images using a DPT-based model. Then, it utilizes GPT-4 to generate descriptive captions for scene elements. These captions, along with style prompts, are used to generate new views via Stable Diffusion. Finally, PyTorch3D is employed for rendering and animating the scene based on user-defined camera paths. This approach allows for scene expansion and controlled exploration.

Quick Start & Requirements

Installation: Clone the repository, create a Conda environment (mamba create --name wonderjourney python=3.10), and install dependencies including PyTorch 1.13.0 with CUDA 11.6, PyTorch3D, fvcore, iopath, nvidiacub, and other requirements from requirements.txt. A spaCy English model (en_core_web_sm) is also needed.
Prerequisites: Requires a CUDA-compatible GPU with at least 24GB of VRAM. An OpenAI API key for GPT-4 is necessary. The MiDaS DPT model (dpt_beit_large_512.pt) must be downloaded.
Resources: Setup involves cloning, environment creation, and multiple dependency installations, which can take a significant amount of time.
Documentation: Website, arXiv

Highlighted Details

Generates dynamic camera paths and 3D scenes from single images.
Integrates GPT-4 for scene description generation.
Utilizes Stable Diffusion for novel view synthesis.
Employs PyTorch3D for rendering and animation.

Maintenance & Community

The project is associated with authors from institutions like MIT and Google. Links to the primary author's Twitter are provided.

Licensing & Compatibility

The repository does not explicitly state a license. The project acknowledges contributions from other open-source projects, which may have their own licenses.

Limitations & Caveats

The project requires substantial GPU memory (24GB) and a specific CUDA version. The dependency installation process, particularly for PyTorch3D, can be complex and time-consuming. The use of GPT-4 implies potential costs and reliance on OpenAI's API.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days