Discover and explore top open-source AI tools and projects—updated daily.
dexmalReal-time Visual-Language Agent inference
Top 62.9% on SourcePulse
This project provides accelerated inference kernels for the Pi0 VLA model, enabling real-time performance for applications requiring fast visual understanding and action. It targets researchers and developers in robotics and embodied AI, offering significant latency reductions for complex tasks, demonstrated by a real-world falling pen catch with sub-200ms latency.
How It Works
The architecture decomposes VLA computation into a vision encoder, LLM, and action expert, simplifying the entire pipeline to 24 GEMM-like operations. This modular structure is optimized using custom Triton kernels for efficient GPU execution, achieving high inference frequencies.
Quick Start & Requirements
pi0_infer.py into your project. Use convert_from_jax.py to load checkpoints. Example Python API: infer.forward(normalized_observation_image_bfloat16, observation_state_bfloat16, diffusion_input_noise_bfloat16).torch, triton.arXiv:2510.26742.Highlighted Details
Maintenance & Community
No specific details on maintenance, community channels, or contributors are provided in the README snippet.
Licensing & Compatibility
The license type is not explicitly mentioned in the provided README snippet.
Limitations & Caveats
The inference kernels are specifically tuned for RTX 4090 and CUDA 12.6, though they are expected to function on similar platforms supporting torch and triton.
2 months ago
Inactive
NVIDIA
Physical-Intelligence