Discover and explore top open-source AI tools and projects—updated daily.
wenqsunResearch paper for 3D/4D scene generation from a single image using video diffusion
Top 30.6% on SourcePulse
DimensionX is a framework for generating 3D and 4D scenes from single images using controllable video diffusion. It targets researchers and developers in computer vision and graphics who need to create complex spatial and temporal scene representations from limited input. The primary benefit is enabling precise control over scene structure and motion, bridging the gap between generated videos and real-world scene reconstruction.
How It Works
DimensionX employs a novel ST-Director module to decouple spatial and temporal factors in video diffusion models. It achieves this by learning dimension-aware LoRAs from dimension-variant datasets. This approach allows for fine-grained manipulation of spatial layout and temporal dynamics, facilitating the reconstruction of 3D and 4D scene representations from sequential frames. For 3D generation, a trajectory-aware mechanism is used, while 4D generation incorporates an identity-preserving denoising strategy.
Quick Start & Requirements
src/gradio_demo/ and run pip install -r requirements.txt.OPENAI_API_KEY and OPENAI_BASE_URL, then execute python app.py.diffusers library, and potentially VLM for image captioning. GPU acceleration is highly recommended for inference.CogVideoXImageToVideoPipeline is provided.Highlighted Details
Maintenance & Community
Licensing & Compatibility
diffusers library.Limitations & Caveats
ValueError with fuse_lora related to text_encoder not being found, suggesting a workaround.2 weeks ago
Inactive