DimensionX by wenqsun

Research paper for 3D/4D scene generation from a single image using video diffusion

Created 1 year ago

1,323 stars

Top 30.1% on SourcePulse

Project Summary

DimensionX is a framework for generating 3D and 4D scenes from single images using controllable video diffusion. It targets researchers and developers in computer vision and graphics who need to create complex spatial and temporal scene representations from limited input. The primary benefit is enabling precise control over scene structure and motion, bridging the gap between generated videos and real-world scene reconstruction.

How It Works

DimensionX employs a novel ST-Director module to decouple spatial and temporal factors in video diffusion models. It achieves this by learning dimension-aware LoRAs from dimension-variant datasets. This approach allows for fine-grained manipulation of spatial layout and temporal dynamics, facilitating the reconstruction of 3D and 4D scene representations from sequential frames. For 3D generation, a trajectory-aware mechanism is used, while 4D generation incorporates an identity-preserving denoising strategy.

Quick Start & Requirements

Install: Navigate to src/gradio_demo/ and run pip install -r requirements.txt.
Run: Set OPENAI_API_KEY and OPENAI_BASE_URL, then execute python app.py.
Prerequisites: Requires Python, diffusers library, and potentially VLM for image captioning. GPU acceleration is highly recommended for inference.
Demo: An online Hugging Face demo is available.
Inference Code: Example inference code using CogVideoXImageToVideoPipeline is provided.

Highlighted Details

Generates photorealistic 3D and 4D scenes from single images.
Utilizes ST-Director for controllable spatial and temporal video diffusion.
Includes trajectory-aware mechanism for 3D and identity-preserving denoising for 4D.
Released partial model checkpoints (Orbit Left, Orbit Up) on Google Drive and Hugging Face.

Maintenance & Community

The project is actively under development with a roadmap including releasing more checkpoints, T-Director, long video generation, video interpolation, and 3DGS optimization code.
A Hugging Face demo is available.

Licensing & Compatibility

The project is released under the Apache 2.0 license.
The code is based on CogVideoX and uses diffusers library.

Limitations & Caveats

Currently, only partial model checkpoints are released.
The provided inference code notes a potential ValueError with fuse_lora related to text_encoder not being found, suggesting a workaround.
The full suite of features, including T-Director and 4D generation code, is still under development.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

1

Star History

9 stars in the last 30 days

Explore Similar Projects

awesome-4d-generation by cwchenwang

List of research papers for 4D generation

Created 1 year ago

Updated 1 year ago

Diffusion4D by VITA-Group

Research code for fast, consistent 4D generation via video diffusion models

Created 1 year ago

Updated 11 months ago

gcd by basilevh

Generative model for extreme monocular dynamic novel view synthesis

Created 1 year ago

Updated 1 month ago

4DNeX by 3DTopia

Feed-forward 4D generative modeling from single images

Created 4 months ago

Updated 4 weeks ago

awesome-3DGS by qqqqqqy0227

Real-time 3D scene rendering and reconstruction

Created 1 year ago

Updated 1 year ago

Awesome-4D-Spatial-Intelligence by yukangcao

Navigating 4D spatial intelligence reconstruction from video

Created 5 months ago

Updated 6 days ago

WonderWorld by KovenYu

Interactive 3D scene generation from a single image

Created 1 year ago

Updated 9 months ago

lyra by nv-tlabs

Generative 3D scene reconstruction from single inputs

Created 4 months ago

Updated 3 months ago

Starred by

Noah Snavely

Noah Snavely(Research Scientist at Google DeepMind; Professor at Cornell Tech).

WonderJourney by KovenYu

AI research paper for generating scene videos with camera movement

Created 2 years ago

Updated 1 year ago

Tora by alibaba

Research paper for trajectory-oriented video generation using diffusion transformers

Created 1 year ago

Updated 6 months ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika),

Amit Jain

Amit Jain(Cofounder of Luma AI), and

2 more.

LLFF by Fyusion

TensorFlow code for novel view synthesis from sparse images (SIGGRAPH 2019 paper)

Created 6 years ago

Updated 2 years ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

8 more.

generative-models by Stability-AI

Generative models SDK for video, image, and 3D synthesis research

Created 2 years ago

Updated 3 weeks ago

Feedback? Help us improve.