Discover and explore top open-source AI tools and projects—updated daily.
Feed-forward 4D generative modeling from single images
Top 59.4% on SourcePulse
Summary
4DNeX offers a feed-forward framework for single-image 4D generative modeling, producing dynamic 3D scene representations. It bypasses computationally intensive optimization and multi-frame input requirements of prior methods, delivering an efficient, end-to-end image-to-4D solution. Targeted at generative modeling and computer vision practitioners, it enables high-quality dynamic point cloud generation and novel-view video synthesis with robust generalizability.
How It Works
The framework fine-tunes a pretrained video diffusion model (Wan2.1 I2V 14B) via adaptation strategies. A unified 6D video representation jointly models RGB and XYZ sequences for structured appearance and geometry learning. This approach is supported by the large-scale 4DNeX-10M dataset, curated to address 4D data scarcity. The feed-forward, single-image-to-4D pipeline provides a scalable and efficient alternative to optimization-heavy methods.
Quick Start & Requirements
Environment setup requires Conda, Python 3.10, PyTorch with CUDA 12.1, git-lfs
, and rerun-sdk
. Users must download pretrained Wan2.1 I2V 14B and 4DNex-Lora weights from Hugging Face into a specified directory structure. Inference is executed via a Python script, referencing example image and prompt files. The 4DNeX-10M dataset from Hugging Face is a prerequisite for training.
Highlighted Details
Maintenance & Community
The provided README lacks specific details on project maintainers, community channels, or a public roadmap.
Licensing & Compatibility
The README does not specify the software license or provide compatibility notes for commercial use or closed-source linking.
Limitations & Caveats
The project's TODO list indicates missing data preprocessing and visualization scripts. A data preparation script for training is also noted as absent.
2 days ago
Inactive