WonderJourney  by KovenYu

AI research paper for generating scene videos with camera movement

created 1 year ago
762 stars

Top 46.6% on sourcepulse

GitHubView on GitHub
Project Summary

WonderJourney enables users to generate novel 3D scenes and dynamic camera paths from single images. It targets researchers and artists interested in generative 3D content creation, offering a way to explore and animate scenes beyond their initial input.

How It Works

The system leverages a multi-stage process. It begins by estimating depth from input images using a DPT-based model. Then, it utilizes GPT-4 to generate descriptive captions for scene elements. These captions, along with style prompts, are used to generate new views via Stable Diffusion. Finally, PyTorch3D is employed for rendering and animating the scene based on user-defined camera paths. This approach allows for scene expansion and controlled exploration.

Quick Start & Requirements

  • Installation: Clone the repository, create a Conda environment (mamba create --name wonderjourney python=3.10), and install dependencies including PyTorch 1.13.0 with CUDA 11.6, PyTorch3D, fvcore, iopath, nvidiacub, and other requirements from requirements.txt. A spaCy English model (en_core_web_sm) is also needed.
  • Prerequisites: Requires a CUDA-compatible GPU with at least 24GB of VRAM. An OpenAI API key for GPT-4 is necessary. The MiDaS DPT model (dpt_beit_large_512.pt) must be downloaded.
  • Resources: Setup involves cloning, environment creation, and multiple dependency installations, which can take a significant amount of time.
  • Documentation: Website, arXiv

Highlighted Details

  • Generates dynamic camera paths and 3D scenes from single images.
  • Integrates GPT-4 for scene description generation.
  • Utilizes Stable Diffusion for novel view synthesis.
  • Employs PyTorch3D for rendering and animation.

Maintenance & Community

The project is associated with authors from institutions like MIT and Google. Links to the primary author's Twitter are provided.

Licensing & Compatibility

The repository does not explicitly state a license. The project acknowledges contributions from other open-source projects, which may have their own licenses.

Limitations & Caveats

The project requires substantial GPU memory (24GB) and a specific CUDA version. The dependency installation process, particularly for PyTorch3D, can be complex and time-consuming. The use of GPT-4 implies potential costs and reliance on OpenAI's API.

Health Check
Last commit

10 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
7 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
created 2 years ago
updated 1 year ago
Feedback? Help us improve.