AI research paper for generating scene videos with camera movement
Top 46.6% on sourcepulse
WonderJourney enables users to generate novel 3D scenes and dynamic camera paths from single images. It targets researchers and artists interested in generative 3D content creation, offering a way to explore and animate scenes beyond their initial input.
How It Works
The system leverages a multi-stage process. It begins by estimating depth from input images using a DPT-based model. Then, it utilizes GPT-4 to generate descriptive captions for scene elements. These captions, along with style prompts, are used to generate new views via Stable Diffusion. Finally, PyTorch3D is employed for rendering and animating the scene based on user-defined camera paths. This approach allows for scene expansion and controlled exploration.
Quick Start & Requirements
mamba create --name wonderjourney python=3.10
), and install dependencies including PyTorch 1.13.0 with CUDA 11.6, PyTorch3D, fvcore, iopath, nvidiacub, and other requirements from requirements.txt
. A spaCy English model (en_core_web_sm
) is also needed.dpt_beit_large_512.pt
) must be downloaded.Highlighted Details
Maintenance & Community
The project is associated with authors from institutions like MIT and Google. Links to the primary author's Twitter are provided.
Licensing & Compatibility
The repository does not explicitly state a license. The project acknowledges contributions from other open-source projects, which may have their own licenses.
Limitations & Caveats
The project requires substantial GPU memory (24GB) and a specific CUDA version. The dependency installation process, particularly for PyTorch3D, can be complex and time-consuming. The use of GPT-4 implies potential costs and reliance on OpenAI's API.
10 months ago
1+ week