Discover and explore top open-source AI tools and projects—updated daily.
4D embodied world models for robotics
Top 82.4% on SourcePulse
TesserAct is an open-source, generalized 4D world model for robotics, designed to generate RGB, depth, and normal videos from image and text instructions, enabling 4D scene reconstruction and action prediction. It targets researchers and practitioners in embodied AI and robotics seeking to build more capable and generalizable robotic agents.
How It Works
TesserAct leverages a diffusion-based approach, building upon CogVideoX, to learn 4D representations of the world. It processes image and text inputs to predict future video frames, including geometric information like depth and normals, facilitating a comprehensive understanding of the environment for robotic control. This approach allows for the generation of realistic and geometrically consistent video predictions.
Quick Start & Requirements
python=3.9
), activate it, clone the repository, and install dependencies with pip install -r requirements.txt
followed by pip install -e .
.DATA.md
.python inference/inference_rgbdn_sft.py
or python inference/inference_rgb_lora.py
with specified weights and image paths.USAGE.MD
for detailed inference guidance.Highlighted Details
Maintenance & Community
The project is associated with the UMass Embodied AGI Lab. Further community engagement details (e.g., Discord/Slack) are not explicitly mentioned in the README.
Licensing & Compatibility
The repository does not explicitly state a license. The code and models are released for research purposes. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
LoRA fine-tuning is experimental and not fully tested. Normal data generation may have imperfections, with ongoing work to improve it using NormalCrafter. The full dataset is not yet released due to storage size constraints of float depth data.
1 month ago
1 day